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ABSTRACT 


There  are  between  150  and  200  parameters  for  measuring  the  performance  of  ship 
maintenance  processes  in  the  U.S.  Navy.  Despite  this  level  of  detail,  budgets  and 
timelines  for  performing  maintenance  on  the  Navy’s  fleet  appear  to  be  problematic. 
Making  sense  of  what  these  parameters  mean  in  terms  of  the  overall  performance  of  ship 
maintenance  processes  is  clearly  a  big  data  problem. 

The  current  process  for  presenting  data  on  the  more  than  150  parameters 
measuring  ship  maintenance  performance  costs  and  processes,  containing  billions  of  data 
points,  is  still  done  by  static,  cumbersome  spreadsheets.  The  central  goal  of  this  thesis  is 
to  provide  a  means  to  aggregate  voluminous  maintenance  data  in  such  a  way  that  the 
causal  factors  contributing  to  cost  and  schedule  overruns  can  be  better  understood  by  ship 
maintenance  leadership. 

Big  data  visualization  software  was  examined  to  determine  if  visualization  tools 
could  improve  the  understanding  of  U.S.  Navy  ship  maintenance  by  its  leaders.  This 
thesis  concludes  that  the  visualization  of  big  data  supports  decision  making  by  enabling 
leaders  to  quickly  identify  trends,  develop  a  better  understanding  of  the  problem  space, 
establish  defensible  baselines  for  monitoring  activities,  perform  forecasting,  and  evaluate 
metrics  for  use. 


V 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


VI 


TABLE  OF  CONTENTS 


I.  INTRODUCTION . I 

A.  OVERVIEW . I 

II.  LITERATURE  REVIEW . 3 

A.  BIODATA . 3 

B.  THE  BIG  DATA  ECOSYSTEM . 6 

C.  BIG  DATA  TECHNOLOGIES  AND  TOOLS . 10 

D.  GOVERNMENT  SPENDING  ON  BIG  DATA . 23 

E.  BIG  DATA  PROJECTS  IN  GOVERNMENT . 24 

F.  GOVERNMENT  BIG  DATA  CASE  STUDIES . 27 

G.  LESSONS  LEARNED . 29 

H.  BIG  DATA  IN  THE  U.S.  NAVY . 31 

III.  SHIP  MAINTENANCE  VIGNETTES . 33 

A.  INTRODUCTION . 33 

B.  MAINTENANCE  AND  MODERNIZATION  SPENDING . 33 

C.  MAINTENANCE  VIGNETTES . 35 

1.  New  Work  Vignette:  USS  Iwo  Jima . 36 

2.  Deferred  Maintenance  Vignette:  USS  Bataan  and  USS  Iwo 

Jima . 37 

3.  Modernizations  Vignettes:  USS  Iwo  Jima  and  USS  Wasp . 38 

D.  SUMMARY . 40 

IV.  SHIP  MAINTENANCE  SIMULATIONS . 41 

A.  OVERVIEW . 41 

B.  MAINTENANCE  COST  CATEGORIES . 43 

C.  DATA  COLLECTION . 44 

D.  FINAL  SIMULATION  RESULTS  INCORPORATING  DIFFERENT 

COMBINATIONS  OF  TECHNOLOGIES  INTO  U.S.  NAVY  SHIP 
MAINTENANCE  PROGRAMS . 44 

E.  VISUALIZATION  SOFTWARE  ANALYSIS  OF  U.S.  NAVY  SHIP 

MAINTENANCE . 45 

1.  Visualization  Model . 45 

2.  Definitized  Estimate,  All  Ships . 48 

3.  Definitized  Estimates  of  the  Top  5  Ships . 51 

4.  Definitized  Estimates  of  Top  5  Ships  by  Expense  Details . 53 

5.  Actual  Costs  of  the  Top  5  Ships  by  Type  Expense . 57 

6.  Definitized  Estimate  versus  Actual  of  the  Top  5  Ships  by  Type 

Expense . 60 

7.  Definitized  Estimate  versus  Actual  of  the  Top  5  Ships  by  Type 

Expense . 63 

8.  Definitized  Estimate  versus  Actual  of  the  Top  5  Ships  by  Work  ..65 

9.  Simulation  I  and  2:  Introduction  of  3DP  and  AM  Radical . 68 

a.  Actual  versus  3DP  for  the  Top  5  Ships  by  Type  Expense 69 

vii 


b.  Actual  versus  3DP  of  the  Top  5  Ships  by  Type  Expense, 

Work . 71 

c.  Actual  versus  AM  Radical  of  the  Top  5  Ships  by  Type 

Expense . 73 

d.  Actual  versus  AM  Radical  of  the  Top  5  Ships  by  Type 

Expense,  Work . 76 

10.  Alternative  Figures . 79 

a.  Definitized  Estimate  versus  Actual  of  the  Type  Expense  by 

Work . 79 

b.  Definitized  Estimate  versus  Actual  of  the  Work  by  Ship . 82 

c.  Actual  versus  AM  Radical  of  the  Work  by  Ship . 85 

11.  LOD  and  Availability  Density  Bubble  Charts . 87 

a.  LOD  versus  Expense  (Actual  Cost) . 87 

b.  LOD  versus  Expense  (Actual  Cost) — Highlighted . 90 

c.  Availability  Density  versus  Expense  (Actual  Cost) . 92 

12.  Drill  Down  Spreadsheets . 95 

F.  SUMMARY . 106 

V.  CONCLUSIONS  AND  RECOMMENDATIONS . 107 

A.  CONCLUSIONS . 107 

B.  RECOMMENDATIONS . 109 

APPENDIX  BIG  DATA  IMPLICATIONS  FOR  ENTERPRISE 

ARCHITECTURE . Ill 

LIST  OF  REFERENCES . 117 

INITIAL  DISTRIBUTION  LIST . 121 


viii 


LIST  OF  FIGURES 


Figure  1.  The  Digital  Universe  (from  Gantz  &  Reinsel,  2012) . 3 

Figure  2.  Big  Data  Revenue  by  Type  (from  Kelly  et  al.,  2013) . 4 

Figure  3.  Big  Data  Revenue  by  Component  (from  Kelly  et  al.,  2013) . 5 

Figure  4.  Big  Data  Market  Projection  by  Segment  (from  Kelly  et  al.,  2013) . 5 

Figure  5.  Bar  Chart  (from  Choy,  Chawla,  &  Whitman,  2012) . 16 

Figure  6.  Box  Plot  (from  Choy  et  al.,  2012) . 17 

Figure  7.  Bubble  Plot  (from  Choy  et  al.,  2012) . 17 

Figure  8.  Correlation  Matrix  (from  Choy  et  al.,  2012) . 18 

Figure  9.  Cross-Tabulation  Chart  (from  Choy  et  al.,  2012) . 18 

Figure  10.  Clustergram  (from  Manyika  et  al,  201 1) . 19 

Figure  1 1 .  Geo  Map  (from  Choy  et  al. ,  20 1 2) . 19 

Figure  12.  Heat  Map  (from  Choy  et  al.,  2012) . 19 

Figure  13.  Histogram  (from  Choy  et  al.,  2012) . 20 

Figure  14.  History  Flow  (from  Manyika  et  al.,  201 1) . 20 

Figure  15.  Line  Chart  (from  Choy  et  al.,  2012) . 21 

Figure  16.  Pareto  Chart  (from  Choy  et  al.,  2012) . 21 

Figure  17.  Scatter  Plot  (from  Choy  et  al.,  2012) . 22 

Figure  18.  Tag  Cloud  (from  Manyika  et  al.,  201 1) . 22 

Figure  19.  Tree  Map  (from  Choy  et  al.,  2012) . 22 

Figure  20.  U.S.  Government  Spending  on  Big  Data  (from  King,  August,  2013) . 23 

Figure  21.  Systems  Supported  by  DOD  Maintenance  (from  OASD[L&MR],  201 1) . 33 

Figure  22.  U.S.  Navy  Ship  Maintenance  Costs  (from  Department  of  the  Navy,  2012). ...34 

Figure  23.  Ship  Maintenance  Work  Classifications . 35 

Figure  24.  Vignette  Overview . 36 

Figure  25.  Project  Phases . 42 

Figure  26.  Visualization  Model  (from  J.  Kornitsky,  personal  communication, 

November,  2013) . 47 

Figure  27.  Definitized  Estimate,  All  Ships  Solar  Graph  (from  J.  Kornitsky,  personal 

communication,  November,  2013) . 50 

Figure  28.  Definitized  Estimate,  Top  5  Ships  Solar  Graph  (from  J.  Kornitsky, 

personal  communication,  November,  2013) . 52 

Eigure  29.  Definitized  Estimate,  Top  5  Ships,  Expense  Detail  Solar  Graph  (from  J. 

Kornitsky,  personal  communication,  November,  2013) . 56 

Eigure  30.  Actual  Cost,  Top  5  Ships,  Type  Expense  Solar  Graph  (from  J.  Kornitsky, 

Personal  Communication,  November,  2013) . 59 

Eigure  31.  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Type  Expense,  Solar 
Graph  Close-up  (from  J.  Kornitsky,  personal  communication,  November, 

2013) . 62 

Eigure  32.  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Type  Expense  Solar 

Graph  (from  J.  Kornitsky,  personal  communication,  November,  2013) . 64 

Eigure  33.  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Work  Solar  Graph  (from 

J.  Kornitsky,  personal  communication,  November,  2013) . 67 


IX 


Figure  34.  Actual  versus  3DP,  Top  5  Ships,  Type  Expense  Solar  Graph  (from  J. 

Komitsky,  personal  communication,  November,  2013) . 70 

Figure  35.  Actual  versus  3DP,  Top  5  Ships,  Type  Expense,  Work  Solar  Graph  (from 

J.  Komitsky,  personal  communication,  November,  2013) . 72 

Figure  36.  Actual  versus  AM  Radical,  Top  5  Ships,  Type  Expense  Solar  Graph  (from 

J.  Komitsky,  personal  communication,  November,  2013) . 75 

Figure  37.  Actual  versus  AM  Radical,  Top  5  Ships,  Type  Expense,  Work  Solar 

Graph  (from  J.  Komitsky,  personal  communication,  November,  2013) . 78 

Figure  38.  Definitized  Estimate  versus  Actual,  Type  Expense,  Work  Solar  Graph 

(from  J.  Komitsky,  personal  communication,  November,  2013) . 81 

Figure  39.  Definitized  Estimate  versus  Actual,  Work,  Ship  Solar  Graph  (from  J. 

Komitsky,  personal  communication,  November,  2013) . 84 

Figure  40.  Actual  versus  AM  Radical,  Work,  Ship  Solar  Graph  (from  J.  Komitsky, 

personal  communication,  November,  2013) . 86 

Figure  41.  EOD  versus  Expense  (Actual  Cost)  Bubble  Chart  (from  J.  Komitsky, 

personal  communication,  November,  2013) . 89 

Figure  42.  EOD  versus  Expense  (Actual  Cost)  -  Highlighted  Bubble  Chart  (from  J. 

Komitsky,  personal  communication,  November,  2013) . 91 

Figure  43.  Availability  Density  versus  Expense  (Actual  Cost)  Bubble  Chart  (from  J. 

Komitsky,  personal  communication,  November,  2013) . 94 

Figure  44.  Barry  Drill  Down,  3  Eevels  of  Detail  Drill  Down  Spreadsheet  (from  J. 

Komitsky,  personal  communication,  2013) . 96 

Figure  45.  Barry  Drill  Down,  4  Eevels  of  Detail  Drill  Down  Spreadsheet  (from  J. 

Komitsky,  personal  communication,  2013) . 97 


X 


LIST  OF  TABLES 


Table  1.  Big  Data  Vendors  (from  Kelly  et  al.,  2013) . 6 

Table  2.  Big  Data  Analyzing  Techniques  (from  Manyika  et  al.,  201 1) . 11 

Table  3.  Big  Data  Analysis  Technologies  (from  Manyika  et  al,  201 1) . 14 

Table  4.  High  Level  Summary  of  Case  Studies  (from  TechAmerica  Foundation, 

2012) . 29 

Table  5.  Cost  Comparison  by  Ship  (after  J.  Komitsky,  personal  communication, 

November,  2013) . 45 

Table  6.  Cost  Comparison  by  Work  (after  J.  Komitsky,  personal  communication, 

November,  2013) . 45 


THIS  PAGE  INTENTIONALLY  LEET  BLANK 


LIST  OF  ACRONYMS  AND  ABBREVIATIONS 


3D 

three  dimensional 

3DP 

three-dimensional  printing 

ADAMS 

Anomaly  Detection  at  Multiple  Scales 

AM 

additive  manufacturing 

CANES 

Consolidated  Afloat  Networks  and  Enterprise  Services 

CINDER 

cyber  insider 

CMS 

Centers  for  Medicare  and  Medicaid  Services 

CPEM 

Collaborative  Product  Eifecycle  Management 

DARPA 

Defense  Advanced  Research  Projects  Agency 

DDG 

guided  missile  destroyer 

DECKPEATE 

Decision  Knowledge  Programming  for  Eogistics  Analysis  and 
Technical  Evaluation 

DHS 

Department  of  Homeland  Security 

DEH 

direct  labor  hours 

DM 

deferred  maintenance 

DOD 

Department  of  Defense 

DON 

Department  of  Navy 

EA 

enterprise  architecture 

ERA 

Electronic  Records  Archive 

G 

growth 

HSE 

Homeland  Security  Enterprise 

IRS 

Internal  Revenue  Service 

IT 

information  technology 

KVA 

knowledge  value  added 

E&MR 

Eogistics  &  Material  Readiness 

EOD 

lost  operating  days 

EST 

laser  scanning  technology 

MGI 

McKinsey  Global  Institute 

NARA 

National  Archive  and  Records  Administration 

NASA 

National  Aeronautics  and  Space  Administration 

xiii 

NAVAIR 

Naval  Air  Systems  Command 

NAVSEA 

Naval  Sea  Systems  Command 

NG 

new  growth 

NOAA 

National  Oceanic  and  Atmospheric  Administration 

NW 

new  work 

NoSQL 

not  only  structured  query  language 

NSSA 

Norfolk  Ship  Support  Activity 

NFS 

Naval  Postgraduate  School 

OASD 

Office  of  the  Assistant  Secretary  of  Defense 

OW 

original  work 

PCD 

project  completion  date 

PEG 

Program  Executive  Office 

PROCEED 

Programming  Computation  on  Encrypted  Data 

REI 

request  for  information 

RMC 

regional  maintenance  centers 

ROI 

return  on  investment 

SME 

subject  matter  expert 

STIMS 

Surface  Team  One  Metrics  System 

TB 

terabyte 

USS 

United  States  Ship 

UT 

ultrasonic  testing 

VIRAT 

Video  and  Image  Retrieval  Analysis  Tool 

XIV 


EXECUTIVE  SUMMARY 


The  extraordinary  demand  placed  on  U.S.  armed  forces  requires  that  the  highest  levels  of 
readiness  be  maintained.  The  pressure  to  reduce  costs,  while  maintaining  the  highest 
levels  of  readiness,  compels  each  of  our  military  services  to  periodically  review  internal 
processes  to  ensure  responsible  use  of  our  nation’s  resources.  One  such  process  currently 
in  review  involves  Department  of  Defense  maintenance  programs.  In  FY2011,  the  U.S. 
Navy  spent  $682  million  maintaining  its  destroyers,  representing  only  22%  of  the  286 
ships  currently  in  the  fleet.  According  to  a  2012  Government  Accountability  Office 
report  on  ship  readiness,  by  2019,  the  U.S.  Navy  expects  to  have  grown  its  fleet  by 
another  14  ships  to  a  total  of  300.  The  size  of  the  U.S.  Navy’s  ship  maintenance  budget 
makes  it  a  prime  candidate  for  review. 

Reviewing  ship  maintenance  programs  is  a  complex  task.  There  are  between  150 
and  200  parameters  for  measuring  the  performance  of  ship  maintenance  processes  in  the 
U.S.  Navy.  Despite  this  level  of  detail,  budgets  and  timelines  for  performing  maintenance 
on  the  Navy’s  fleet  appear  to  be  problematic.  Making  sense  of  what  these  parameters 
mean  is  clearly  a  big  data  problem.  Fortunately,  the  value  of  big  data  analysis  has  become 
evident  and  many  analysis  solutions  exist.  Big  data  visualization  was  selected  for  closer 
examination  and  a  sample  of  U.S.  Navy  ship  maintenance  availabilities  were  used  to 
explore  the  technique. 

Big  data  visualization  software  was  examined  to  determine  if  visualization  tools 
could  improve  the  understanding  of  U.S.  Navy  ship  maintenance  by  its  leaders.  This 
thesis  concludes  that  the  visualization  of  big  data  supports  decision  making  by  enabling 
leaders  to  quickly  identify  trends,  develop  a  better  understanding  of  the  problem  space, 
establish  defensible  baselines  for  monitoring  activities,  perform  forecasting,  and  evaluate 
metrics  for  use.  For  U.S.  Navy  ship  maintenance  decision  makers  desiring  ways  to 
improve  the  speed  and  accuracy  of  their  decisions,  they  should  consider  the  use  of 
visualization  software  in  their  industry.  To  optimize  the  use  of  big  data  visualization,  this 


XV 


thesis  recommends  the  continued  and  expanded  collection  of  data,  identification  of 
performance  accounting  software  for  tracking,  and  the  use  of  forecasting  once  accurate 
ship  maintenance  performance  baselines  are  established. 
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I.  INTRODUCTION 


A.  OVERVIEW 

There  are  between  150  and  200  parameters  for  measuring  the  performance  of  ship 
maintenance  processes  in  the  U.S.  Navy.  Despite  this  level  of  detail,  budgets  and 
timelines  for  performing  maintenance  on  the  Navy’s  fleet  appear  to  be  problematic. 
Making  sense  of  what  these  parameters  mean  in  terms  of  the  overall  performance  of  ship 
maintenance  processes  is  clearly  a  big  data  problem. 

A  team  from  the  Naval  Postgraduate  School  (NPS)  was  requested  by  Program 
Executive  Office  (PEO)  SHIPS  to  work  with  naval  ship  maintenance  metrics  groups  to 
provide  additional  options  regarding  how  large  data  sets  could  be  optimized.  The  current 
process  for  presenting  data  on  the  more  than  150  parameters  measuring  ship  maintenance 
performance  costs  and  processes,  containing  billions  of  data  points,  is  still  done  by  static, 
cumbersome  spreadsheets.  The  central  goal  of  this  thesis  is  to  provide  a  means  to 
aggregate  voluminous  maintenance  data  in  such  a  way  that  the  causal  factors  contributing 
to  cost  and  schedule  overruns  can  be  better  understood  by  ship  maintenance  leadership. 
By  providing  this  kind  of  information  in  an  intuitively  visual  form,  leadership  could  be 
assisted  in  budget  and  scheduling  decision  making. 

The  results  of  the  project  are  in  this  report.  In  the  first  section,  we  review  the  big 
data  world  by  looking  at  the  $11  billion  dollar  industry  in  2012.  We  examine  the  issues, 
components,  technologies  and  tools  surrounding  big  data.  The  next  section  focuses  on  big 
data  and  the  federal  government,  which  spent  approximately  $5  billion  in  2012  on 
national  security  and  military  applications.  Included  in  this  section  are  public  sector  big 
data  projects,  case  studies  and  lessons  learned.  Vignettes  are  presented  in  section  3  to 
provide  a  framework  for  understanding  ship  maintenance  activities  in  the  U.S.  Navy. 
Section  4  illustrates  the  power  of  big  data  visualization  software,  with  data  provided  by 
naval  ship  maintenance  metrics  groups.  It  provides  examples  of  how  large  data  sets  could 
be  optimized  with  alternative  presentation  methods  showing  a  ship’s  maintenance  status, 
including  all  operational  costs  and  schedule  deviations  from  planned  maintenance.  It 
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shows  how  visualization  tools  can  dig  deeper  into  numbers  to  improve  how  key 
information  is  summarized  and  ultimately  used  in  making  critical  maintenance  allocation 
decisions.  Data  were  collected  on  19  U.S.  Navy  guided  missile  destroyers  (DDG)  on  21 
maintenance  availabilities  for  those  DDGs.  Information  that  was  collected  included 
definitized  estimates  prepared  by  subject  matter  experts  (SME)  in  the  planning  process, 
along  with  the  actual  cost  and  availability  data  on  three  maintenance  categories.  Two 
simulations  were  run  testing  the  potential  impact  of  incorporating  select  technologies  on 
ship  maintenance  processes.  Conclusions  and  recommendations  are  found  in  the  final 
section. 
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II.  LITERATURE  REVIEW 


A.  BIG  DATA 

The  world  is  exploding  in  digital  data.  IDC  Corporation  predicts  that  from  2005  to 
2020,  the  digital  universe  will  grow  by  a  factor  of  300,  from  130  exabytes  to  40,000 
exabytes,  or  40  zettabytes.  Moreover,  the  digital  universe  will  about  double  every  two 
years  from  now  to  2020;  a  50-fold  growth  in  ten  years  as  seen  in  Figure  1  (Gantz  & 
Reinsel,  2012). 

More  than  5  billion  people  are  calling,  texting,  tweeting  and  browsing  in  mobile 
phones  worldwide  and  350  million  tweets  are  sent  per  day  (Kelly,  Floyer,  Vellante,  & 
Miniman,  2013).  Companies  around  the  world  are  capturing  trillions  of  b3^es  of 
information  on  customers,  suppliers,  and  operations.  The  Me  Kinsey  Global  Institute 
(MGI)  estimates  that  global  enterprises  stored  more  than  7  exabytes  of  new  data  on  disk 
drives  in  2010,  while  consumers  stored  more  than  6  exabytes  of  new  data  on  devices  such 
as  PCs  and  notebooks  (Manyika  et  al,  2011).  The  U.S.  government  produced  848 
petabytes  of  data  in  2009.  Data  collected  by  the  U.S.  Library  of  Congress  as  of  April 
2011  totals  235  TB. 

The  Digital  Universe:  SO-fold  Growth  from  the  Beginning  of 
2010  to  the  End  of  2020 

40,000 

30,000 

20,000 

10.000 


2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020 

Figure  1.  The  Digital  Universe  (from  Gantz  &  Reinsel,  2012) 
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For  the  purposes  of  our  research,  we  will  use  MGFs  definition  of  big  data  as 
datasets  whose  size  is  beyond  the  ability  of  typical  database  software  tools  to  capture, 
store,  manage,  and  analyze  (Manyika  et  ah,  2012).  There  are  many  challenges  with  big 
data,  including  the  ability  to  capture,  store,  curate,  search,  transfer,  share,  analyze  and 
visualize  the  data.  This  section  focuses  on  the  big  data  eco  structure.  It  begins  with  a 
discussion  of  the  market  size,  then  discusses  some  of  the  tools  and  technologies  used  in 
big  data  analysis,  and  looks  at  federal  government  initiatives  involving  big  data. 

The  total  big  data  market  reached  $11.59  billion  in  2012  and  is  estimated  to  grow 
at  an  annual  growth  rate  of  61%  to  $18.1  billion  in  2013,  according  to  Wikibon  (Kelly  et 
ah,  2013).  Figure  2  shows  revenue  by  type  while  Figure  3  gives  a  breakdown  by 
component.  Big  data  requires  the  use  of  software,  hardware,  and  services. 


Big  Data  Revenue  by  Type,  2012 
(in  $US  billions) 

{n  =  $11,559) 


Services 


$5,042 

44K 


Hardware 

54.104 

37% 


SoAwarr 

V2.249 

19% 


Figure  2.  Big  Data  Revenue  by  Type  (from  Kelly  et  ah,  2013) 
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Figure  3.  Big  Data  Revenue  by  Component  (from  Kelly  et  al.,  2013) 

In  addition,  Wikibon  prediets  the  big  data  market  to  exeeed  $47  billion  by  2017, 
growing  at  a  31%  eompound  annual  growth  rate  over  the  five-year  period  from  2012  to 
2017  as  seen  in  Figure  4  (Kelly  et  ah,  2013). 
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Big  Data  Market  Projection  by  Segment,  2011-2017  ($US  biliions) 


$0.00 

2011 

2012 

2013 

2014 

2015 

2016 

2017 

- Big  Data  XaaS  Revenue 

$0.36 

$0.62 

$1.07 

$1.78 

$2.52 

$2.97 

$3.31 

Big  Data  Professional  Services  Revenue 

$2.80 

$4.42 

$6.98 

$10.62 

$14.15 

$16.17 

$17.59 

- -  Big  Data  Application  {Analytic  and  Transactional) 

Revenue 

$0.52 

$0,99 

$1.90 

$3.47 

$5.29 

— 

$6.48 

$7.38 

^^Big  Data  NoSQL  Database  Revenue 

$0.07 

$0.13 

$0.27 

$0.50 

$0.79 

$0.98 

$1.12 

<  »Big  Data  SQL  Database  Revenue 

$0.62 

$0.88 

$1.25 

$1.72 

$2.14 

$2.36 

$2.51 

^^Big  Data  Infrastructure  Software  Revenue 

$0.14 

$0.24 

$0.40 

$0.64 

$0.88 

$1.03 

$1.14 

t  <Big  Data  Networking  Revenue 

$0-15 

$0.23 

$0-37 

$0.56 

$0.75 

$0.86 

$0.93 

^Big  Data  Storage  Revenue 

$1.10 

$1.75 

$276 

$4.20 

$5.59 

$6.39 

$6.95 

^^Big  Data  Compute  Revenue 

$1.53 

$2.29 

$3.40 

$4.89 

$6.26 

$7.01 

$7.53 

^^“Total  Big  Data  Revenue 

$7.3 

$11.6 

$18.4 

$28.4 

$38.4 

$44.2 

$48.5 

Figure  4.  Big  Data  Market  Projeetion  by  Segment  (from  Kelly  et  ah,  2013) 
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B.  THE  BIG  DATA  ECOSYSTEM 

Fueling  the  growth  in  big  data  sales  are  several  factors: 

•  increased  awareness  of  the  benefits  of  big  data  as  applied  to  industries 
beyond  the  web,  most  notably  financial  services,  pharmaceuticals,  and 
retail; 

•  implementation  of  big  data  analysis  requires  software  such  as  Hadoop, 
NoSQL  (not  only  structured  query  language),  data  stores,  in-memory 
analytic  engines,  and  massively  parallel  processing  analytic  databases; 

•  increasingly  sophisticated  professional  services  practices  that  assist 
enterprises  in  practically  applying  the  big  data  requirements  of  hardware 
and  software  to  business  use  cases; 

•  increased  investment  in  big  data  infrastructure  by  massive  Web 
properties — most  notable  Google,  Facebook,  and  Amazon — and 
government  agencies  for  intelligence  and  counter-terrorism  purposes. 
(Kelly  et  al.,  2013,  Growth  Drivers  and  Adoption  Barriers,  para.  3). 

Wikibon  has  been  tracking  the  market  size,  following  more  than  60  vendors  that 
include  both  big  data  pure-plays  and  others  for  whom  big  data  is  part  of  multiple  revenue 
sources  (Kelly  et  al.,  2013).  Table  1  is  a  current  list  of  the  vendors. 


Table  1.  Big  Data  Vendors  (from  Kelly  et  al.,  2013) 

2012  Worldwide  Big  Data  Revenue  by  Vendor  ($US  millions) 


Vendor 

Big  Data 
Revenue 

Total 

Revenue 

Big  Data 
Revenue 
as  %  of 
Total 
Revenue 

% 

Big  Data 

Hardware 

Revenue 

% 

Big  Data 
Software 
Revenue 

% 

Big  Data 

Services 

Revenue 

IBM 

$1,306 

$103,930 

1% 

19% 

31% 

50% 

HP 

$664 

$119,895 

1% 

34% 

29% 

38% 

Teradata 

$435 

$2,665 

16% 

31% 

28% 

41% 

Dell 

$425 

$59,878 

1% 

83% 

0% 

17% 

Oracle 

$415 

$39,463 

1% 

25% 

34% 

41% 

SAP 

$368 

$21,707 

2% 

0% 

67% 

33% 

EMC 

$336 

$23,570 

1% 

24% 

36% 

39% 

Cisco  Systems 

$214 

$47,983 

0% 

58% 

0% 

42% 
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2012  Worldwide  Big  Data  Revenue  by  Vendor  ($US  millions) 


Vendor 

Big  Data 
Revenue 

Total 

Revenue 

Big  Data 
Revenue 
as  %  of 
Total 
Revenue 

% 

Big  Data 

Hardware 

Revenue 

% 

Big  Data 
Software 
Revenue 

% 

Big  Data 

Services 

Revenue 

PwC 

$199 

$31,500 

1% 

0% 

0% 

100% 

Microsoft 

$196 

$$71,474 

0% 

0% 

67% 

33% 

Accenture 

$194 

$29,770 

1% 

0% 

0% 

100% 

Palantir 

$191 

$191 

100% 

0% 

36% 

64% 

Fusion-io 

$190 

$439 

43% 

71% 

0% 

29% 

SAS  Institute 

$187 

$2,954 

6% 

0% 

59% 

41% 

Splunk 

$186 

$186 

100% 

0% 

71% 

29% 

Deloitte 

$183 

$31,300 

1% 

0% 

0% 

100% 

NetApp 

$138 

$6,454 

2% 

77% 

0% 

23% 

Hitachi 

$130 

$112,318 

0% 

0% 

0% 

100% 

Opera 

Solutions 

$118 

$118 

100% 

0% 

0% 

100% 

CSC 

$114 

$15,825 

1% 

0% 

0% 

100% 

Mu  Sigma 

$114 

$114 

100% 

0% 

0% 

100% 

Booz  Allen 
Hamilton 

$88 

$5,802 

1% 

0% 

0% 

100% 

Amazon 

$85 

$56,825 

0% 

0% 

0% 

100% 

TCS 

$82 

$10,170 

1% 

0% 

0% 

100% 

Intel 

$76 

$53,341 

0% 

83% 

0% 

17% 

Capgemini 

$72 

$14,020 

0% 

0% 

0% 

100% 

MarkLogic 

$69 

$78 

88% 

0% 

63% 

38% 

Cloudera 

$56 

$56 

100% 

0% 

47% 

53% 

Actian 

$46 

$46 

100% 

0% 

50% 

50% 

SGI 

$43 

$769 

6% 

83% 

0% 

17% 

GoodData 

$38 

$38 

100% 

0% 

0% 

100% 

lOlOdata 

$37 

$37 

100% 

0% 

0% 

100% 

lOgen 

$36 

$36 

100% 

0% 

42% 

58% 
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2012  Worldwide  Big  Data  Revenue  by  Vendor  ($US  millions) 


Vendor 

Big  Data 
Revenue 

Total 

Revenue 

Big  Data 
Revenue 
as  %  of 
Total 
Revenue 

% 

Big  Data 

Hardware 

Revenue 

% 

Big  Data 
Software 
Revenue 

% 

Big  Data 

Services 

Revenue 

Google 

$36 

$50,175 

0% 

0% 

0% 

100% 

Alteryx 

$36 

$36 

100% 

0% 

55% 

45% 

Guavus 

$35 

$35 

100% 

0% 

57% 

43% 

VMware 

$32 

$3,676 

1% 

0% 

71% 

29% 

ParAccel 

$24 

$24 

100% 

0% 

44% 

56% 

TIBCO 

Software 

$24 

$1,024 

2% 

0% 

53% 

47% 

Informatica 

$24 

$812 

2% 

0% 

63% 

37% 

MapR 

$23 

$23 

100% 

0% 

51% 

49% 

Pervasive 

Software 

$22 

$51 

37% 

0% 

41% 

59% 

Attivio 

$21 

$26 

80% 

0% 

62% 

38% 

Fractal 

Analytics 

$20 

$20 

100% 

0% 

0% 

100% 

Hortonworks 

$18 

$18 

100% 

0% 

50% 

50% 

Rackspace 

$18 

$1,300 

1% 

0% 

0% 

100% 

QlikTech 

$16 

$321 

5% 

0% 

74% 

26% 

DataStax 

$15 

$15 

100% 

0% 

59% 

41% 

Basho 

$14 

$14 

100% 

0% 

63% 

38% 

Microstrategy 

$13 

$595 

2% 

0% 

59% 

41% 

Tableau 

Software 

$13 

$130 

10% 

0% 

59% 

41% 

Kognitio 

$13 

$12 

100% 

0% 

47% 

53% 

Couchbase 

$12 

$12 

$100% 

0% 

64% 

36% 

Datameer 

$10 

$10 

100% 

0% 

80% 

20% 

LucidWorks 

$9 

$9 

100% 

0% 

60% 

40% 

Digital 

$10 

$10 

100% 

0% 

51% 

49% 
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2012  Worldwide  Big  Data  Revenue  by  Vendor  ($US  millions) 


Vendor 

Big  Data 
Revenue 

Total 

Revenue 

Big  Data 
Revenue 
as  %  of 
Total 
Revenue 

% 

Big  Data 

Hardware 

Revenue 

% 

Big  Data 
Software 
Revenue 

% 

Big  Data 

Services 

Revenue 

Aerospike 

$9 

$9 

100% 

0% 

80% 

20% 

Neo 

Technology 

$9 

$9 

100% 

0% 

62% 

38% 

Think  Big 
Analytics 

$8 

$8 

100% 

0% 

0% 

100% 

Calpont 

$8 

$8 

100% 

0% 

60% 

40% 

RainStor 

$8 

$8 

100% 

0% 

67% 

33% 

SiSense 

$7 

$7 

100% 

0% 

40% 

60% 

Revolution 

Analytics 

$7 

$13 

56% 

0% 

55% 

45% 

Talend 

$6 

$51 

12% 

0% 

80% 

20% 

Jaspersoft 

$6 

$31 

20% 

0% 

62% 

38% 

Juniper 

Networks 

$6 

$4,365 

0% 

70% 

0% 

30% 

Pentaho 

$6 

$31 

19% 

0% 

62% 

38% 

DDN 

$4 

$278 

2% 

63% 

0% 

38% 

Actuate 

$5 

$137 

3% 

0% 

63% 

37% 

Original 

Device 

Manufacturers 

$2,375 

$100,000 

2% 

100% 

0% 

0% 

Other 

$1,613 

$197,170 

1% 

17% 

13% 

70% 

Total 

$11,565 

$1,244,602 

1% 

37% 

19% 

44% 

Big  data  is  generated  by  a  variety  of  sources.  The  sources  from  which  big  data 
originate  include  industry  specific  transactions,  machine/sensor  indications,  web 
applications,  and  text  (Ferguson,  2013).  Industry-specific  transactions  can  include  call 
records  and  geographic  location  data.  Machines  generate  extremely  large  volumes  of 
information  every  day  and  can  range  in  complexity  from  simple  temperature  readings  to 
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the  performance  parameters  of  a  gas-turbine  engine.  Big  data  on  the  web  also  ranges  in 
format  from  machine  language  to  customer  comments  on  social  networks  and  also  is 
produced  in  considerably  sizeable  portions.  Text  sources  can  include  archived 
documents,  external  reports,  or  customer  account  information  (Ferguson,  2013). 

Because  big  data  comes  from  a  variety  of  sources,  it  also  possesses  characteristics 
which  distinguish  it  from  data  in  the  traditional  context.  Common  terms  used  to  define 
the  qualities  of  big  data  include  volume,  variety,  velocity,  and  value  (Dijcks,  2013).  From 
the  listing  of  sources  above,  one  can  understand  that  the  volume  of  data  generated  on  a 
daily  basis  is  enormous.  For  example,  Dijcks  (2013)  stated  that  just  a  single  jet  engine 
produces  10  terabytes  of  data  in  30  minutes.  Extrapolate  that  example  to  include  all  the 
aircraft  currently  airborne,  and  then  include  all  the  factory  infrastructure  around  the  globe 
collecting  data  on  production,  service  life,  and  maintenance  requirements,  and  the 
enormity  of  big  data  volumes  begins  to  emerge.  Another  characteristic  of  big  data, 
variety,  can  be  directly  translated  from  the  various  sources  into  the  variety  of  data 
formats.  Various  data  formats  require  additional  consideration  to  ensure  the  ability  of  all 
systems  to  share  data.  Velocity,  which  is  related  to  volume,  is  the  frequency  with  which 
big  data  is  created.  To  illustrate  velocity,  consider  the  relative  size  of  a  single  Twitter 
feed  (140  characters)  to  the  large  number  of  feeds  generated  in  a  given  time  period 
(Dijcks,  2013).  Finally,  value  is  the  feature  of  big  data,  which  is  important  to  any 
enterprise.  Refer  to  Appendix  A  for  a  paper  regarding  the  implications  of  big  data  on  EA. 

C.  BIG  DATA  TECHNOLOGIES  AND  TOOLS 

Many  techniques  can  be  used  to  analyze  data  sets.  These  techniques  often  draw 
upon  statistics,  computer  science,  and  data  science  can  be  applied  to  big  data  to  generate 
insights  into  large  and  diverse  datasets,  as  well  as  smaller,  diverse  datasets.  Table  2 
summarizes  some  techniques. 
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Table  2.  Big  Data  Analyzing  Techniques  (from  Manyika  et  al.,  201 1) 


A/B  testing 

•  Technique  in  which  a  controi  group  is  compared  with  a  variety  of  test  groups  in  order 
to  determine  what  treatments  (i.e.,  changes)  wili  improve  a  given  objective  variabie. 

•  Big  data  enabies  huge  numbers  of  tests  to  be  executed  and  anaiyzed,  ensuring  that 
groups  are  of  sufficient  size  to  detect  meaningfui  (i.e.,  statisticaily  significant) 
differences  between  the  controi  and  treatment  groups. 

Association  ruie 
iearning 

•  Set  of  techniques  for  discovering  interesting  relationships,  i.e.,  “association  rules,” 
among  variables  in  large  databases. 

•  These  techniques  consist  of  a  variety  of  algorithms  to  generate  and  test  possible 
rules. 

•  An  application  is  market  basket  analysis,  in  which  a  retailer  can  determine  which 
products  are  frequently  bought  together  and  use  this  information  for  marketing  (a 
commonly  cited  example  is  the  discovery  that  many  supermarket  shoppers  who  buy 
diapers  also  tend  to  buy  beer). 

•  Used  for  data  mining. 

Ciassification 

•  Set  of  techniques  to  identify  the  categories  in  which  new  data  points  belong,  based 
on  a  training  set  containing  data  points  that  have  already  been  categorized. 

•  One  application  is  the  prediction  of  segment-specific  customer  behavior  (e.g.,  buying 
decisions,  churn  rate,  consumption  rate)  where  there  is  a  clear  hypothesis  or 
objective  outcome. 

•  These  techniques  are  often  described  as  supervised  learning  because  of  the 
existence  of  a  training  set;  they  stand  in  contrast  to  cluster  analysis,  a  type  of 
unsupervised  learning. 

•  Used  for  data  mining. 

Ciuster  anaiysis 

•  Statistical  method  for  classifying  objects  that  splits  a  diverse  group  into  smaller 
groups  of  similar  objects,  whose  characteristics  of  similarity  are  not  known  in 
advance. 

•  An  example  of  cluster  analysis  is  segmenting  consumers  into  self-similar  groups  for 
targeted  marketing. 

•  This  is  a  type  of  unsupervised  learning  because  training  data  are  not  used. 

•  Used  for  data  mining. 

Crowdsourcing 

•  Technique  for  collecting  data  submitted  by  a  large  group  of  people  or  community  (i.e., 
the  “crowd”)  through  an  open  call,  usually  through  networked  media  such  as  the 

Web. 

•  This  is  a  type  of  mass  collaboration  and  an  instance  of  using  Web  2.0. 

Data  fusion  and 
data  integration 

•  Set  of  techniques  that  integrate  and  analyze  data  from  multiple  sources  in  order  to 
develop  insights  in  ways  that  are  more  efficient  and  potentially  more  accurate  than  if 
they  were  developed  by  analyzing  a  single  source  of  data. 

•  Signal  processing  techniques  can  be  used  to  implement  some  types  of  data  fusion. 

•  One  example  of  an  application  is  sensor  data  from  the  Internet  of  Things  being 
combined  to  develop  an  integrated  perspective  on  the  performance  of  a  complex 
distributed  system  such  as  an  oil  refinery. 

•  Data  from  social  media,  analyzed  by  natural  language  processing,  can  be  combined 
with  real-time  sales  data,  in  order  to  determine  what  effect  a  marketing  campaign  is 
having  on  customer  sentiment  and  purchasing  behavior. 

Data  mining 

•  Set  of  techniques  to  extract  patterns  from  large  datasets  by  combining  methods  from 
statistics  and  machine  learning  with  database  management. 

•  These  techniques  include  association  rule  learning,  cluster  analysis,  classification, 
and  regression. 

•  Applications  include  mining  customer  data  to  determine  segments  most  likely  to 
respond  to  an  offer,  mining  human  resources  data  to  identify  characteristics  of  most 
successful  employees,  or  market  basket  analysis  to  model  the  purchase  behavior  of 
customers. 
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Ensemble 

learning 

•  Using  multiple  predictive  models  (each  developed  using  statistics  and/or  machine 
learning)  to  obtain  better  predictive  performance  than  could  be  obtained  from  any  of 
the  constituent  models. 

•  This  is  a  type  of  supervised  learning. 

Genetic 

algorithms 

•  Technique  used  for  optimization  that  is  inspired  by  the  process  of  natural  evolution  or 
“survival  of  the  fittest.” 

•  Potential  solutions  are  encoded  as  “chromosomes”  that  can  combine  and  mutate. 

•  These  individual  chromosomes  are  selected  for  survival  within  a  modeled 
“environment”  that  determines  the  fitness  or  performance  of  each  individual  in  the 
populatien. 

•  Often  described  as  a  type  of  “evolutionary  algorithm,”  these  algorithms  are  well-suited 
for  solving  nonlinear  problems. 

•  Examples  of  applications  include  impreving  job  scheduling  in  manufacturing  and 
optimizing  the  performance  of  an  investment  portfolio. 

Machine  learning 

•  Subspecialty  of  computer  science  (within  a  field  historically  called  “artificial 
intelligence”)  concerned  with  the  design  and  development  of  algorithms  that  allow 
computers  te  evolve  behaviors  based  on  empirical  data. 

•  A  major  focus  of  machine  learning  research  is  tc  automatically  learn  to  recognize 
complex  patterns  and  make  intelligent  decisions  based  on  data.  Natural  language 
processing  is  an  example  of  machine  learning. 

Natural  language 
processing  (NLP) 

•  Set  of  techniques  from  a  subspecialty  of  computer  science  (within  a  field  historically 
called  “artificial  intelligence”)  and  linguistics  that  uses  computer  algorithms  to  analyze 
human  (natural)  language. 

•  Many  NLP  techniques  are  types  of  machine  learning. 

•  One  application  of  NLP  is  using  sentiment  analysis  on  social  media  to  determine  hew 
prospective  customers  are  reacting  to  a  branding  campaign. 

Neural  networks 

•  Computational  models,  inspired  by  the  structure  and  workings  of  biological  neural 
networks  (i.e.,  the  cells  and  connectiens  within  a  brain),  that  find  patterns  in  data. 

•  Neural  networks  are  well-suited  for  finding  nonlinear  patterns. 

•  Can  be  used  for  pattern  recognition  and  optimization.  Some  neural  network 
applications  involve  supervised  learning  and  others  involve  unsupervised  learning. 

•  Examples  of  applications  include  identifying  high-value  customers  that  are  at  risk  of 
leaving  a  particular  company  and  identifying  fraudulent  insurance  claims. 

Network  analysis 

•  Set  of  techniques  used  to  characterize  relatienships  among  discrete  nodes  in  a  graph 
or  a  network. 

•  In  social  network  analysis,  connections  between  individuals  in  a  community  or 
organization  are  analyzed,  e.g.,  hew  information  travels,  or  who  has  the  most 
influence  over  whom. 

•  Examples  of  applications  include  identifying  key  opinion  leaders  te  target  for 
marketing,  and  identifying  bottlenecks  in  enterprise  informatien  flows. 

Optimization 

•  Portfolio  of  numerical  techniques  used  te  redesign  complex  systems  and  processes 
to  improve  their  performance  according  to  one  or  more  objective  measures  (e.g., 
cost,  speed,  or  reliability). 

•  Examples  of  applications  include  improving  operational  processes  such  as 
scheduling,  routing,  and  floor  layout,  and  making  strategic  decisions  such  as  product 
range  strategy,  linked  investment  analysis,  and  R&D  portfolio  strategy. 

•  Genetic  algcrithms  are  an  example  of  an  optimization  technique. 

Pattern 

recognition 

•  Set  of  machine  learning  techniques  that  assigns  seme  sort  of  output  value  (or  label) 
to  a  given  input  value  (or  instance)  according  to  a  specific  algcrithm. 

•  Classification  techniques  are  an  example. 

Predictive 

modeling 

•  A  set  of  techniques  in  which  a  mathematical  model  is  created  or  chosen  to  best 
predict  the  probability  of  an  outcome. 

•  Example  ef  an  application  in  customer  relationship  management  is  the  use  of 
predictive  models  to  estimate  the  likelihood  that  a  customer  will  “churn”  (i.e.,  change 
providers)  or  the  likelihood  that  a  customer  can  be  cross-sold  another  product. 

•  Regression  is  one  example  of  the  many  predictive  modeling  techniques. 
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Regression 

•  Set  of  statistical  techniques  to  determine  how  the  value  of  the  dependent  variable 
changes  when  one  or  more  independent  variables  is  modified. 

•  Often  used  for  forecasting  or  prediction. 

•  Examples  of  applications  include  forecasting  sales  volumes  based  on  various  market 
and  economic  variables  or  determining  what  measurable  manufacturing  parameters 
most  influence  customer  satisfaction. 

•  Used  for  data  mining. 

Sentiment 

anaiysis 

•  Application  of  natural  language  processing  and  other  analytic  techniques  to  identify 
and  extract  subjective  information  from  source  text  material. 

•  Key  aspects  of  these  analyses  include  identifying  the  feature,  aspect,  or  product 
about  which  a  sentiment  is  being  expressed,  and  determining  the  type,  “polarity”  (i.e., 
positive,  negative,  or  neutral)  and  the  degree  and  strength  of  the  sentiment. 

•  Examples  of  applications  include  companies  applying  sentiment  analysis  to  analyze 
social  media  (e.g.,  blogs,  microblogs,  and  social  networks)  to  determine  how  different 
customer  segments  and  stakeholders  are  reacting  to  their  products  and  actions. 

Signai 

processing 

•  Set  of  techniques  from  electrical  engineering  and  applied  mathematics  originally 
developed  to  analyze  discrete  and  continuous  signals,  i.e.,  representations  of  analog 
physical  quantities  (even  if  represented  digitally)  such  as  radio  signals,  sounds,  and 
images. 

•  This  category  includes  techniques  from  signal  detection  theory,  which  quantifies  the 
ability  to  discern  between  signal  and  noise. 

•  Sample  applications  include  modeling  for  time  series  analysis  or  implementing  data 
fusion  to  determine  a  more  precise  reading  by  combining  data  from  a  set  of  less 
precise  data  sources  (i.e.,  extracting  the  signal  frem  the  noise). 

Spatiai  analysis 

•  Set  of  techniques,  some  applied  from  statistics,  which  analyze  the  tepological, 
geometric,  or  geographic  properties  encoded  in  a  data  set. 

•  Often  the  data  fer  spatial  analysis  come  frem  geegraphic  information  systems  (GIS) 
that  capture  data  including  location  information,  e.g.,  addresses  er  latitude/longitude 
coordinates. 

•  Examples  of  applications  include  the  incorporation  of  spatial  data  into  spatial 
regressions  (e.g.,  how  is  consumer  willingness  to  purchase  a  product  correlated  with 
location?)  or  simulations  (e.g.,  how  would  a  manufacturing  supply  chain  network 
perform  with  sites  in  different  locations?). 

Statistics 

•  Science  of  the  collection,  organization,  and  interpretation  of  data,  including  the  design 
of  surveys  and  experiments. 

•  Statistical  techniques  are  often  used  to  make  judgments  about  what  relationships 
between  variables  could  have  occurred  by  chance  (the  “null  hypothesis”),  and  what 
relatienships  between  variables  likely  result  from  some  kind  of  underlying  causal 
relationship  (i.e.,  that  are  “statistically  significant”). 

•  Statistical  techniques  are  also  used  to  reduce  the  likelihood  of  Type  1  errors  (“false 
positives”)  and  Type  II  errors  (“false  negatives”). 

•  Example  ef  an  application  is  A/B  testing  to  determine  what  types  of  marketing 
material  will  most  increase  revenue. 

Supervised 

learning 

•  Set  of  machine  learning  techniques  that  infer  a  function  or  relationship  from  a  set  of 
training  data. 

•  Examples  include  classification  and  support  vector  machines. 

Simulation 

•  Modeling  the  behavior  of  complex  systems,  often  used  for  forecasting,  predicting  and 
scenario  planning.  Monte  Carle  simulations,  for  example,  are  a  class  of  algorithms 
that  rely  on  repeated  random  sampling,  i.e.,  running  theusands  of  simulations,  each 
based  on  different  assumptions. 

•  Result  is  a  histogram  that  gives  a  probability  distribution  of  outcomes. 

•  One  application  is  assessing  the  likelihood  of  meeting  financial  targets  given 
uncertainties  about  the  success  of  various  initiatives. 

Time  series 
analysis 

•  Set  of  techniques  from  both  statistics  and  signal  processing  for  analyzing  sequences 
of  data  points,  representing  values  at  successive  times,  to  extract  meaningful 
characteristics  from  the  data. 

•  Examples  of  time  series  analysis  include  the  hourly  value  of  a  stock  market  index  or 
the  number  ef  patients  diagnosed  with  a  given  condition  every  day. 

•  Time  series  forecasting  is  the  use  of  a  model  to  predict  future  values  of  a  time  series 
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based  on  known  past  values  of  the  same  or  other  series. 

•  Some  of  these  techniques,  e.g.,  structural  modeling,  decompose  a  series  into  trend, 
seasonal,  and  residual  components,  which  can  be  useful  for  identifying  cyclical 
patterns  in  the  data. 

•  Examples  of  applications  include  forecasting  sales  figures,  or  predicting  the  number 
of  people  who  will  be  diagnosed  with  an  infectious  disease. 

Unsupervised 

learning 

•  Set  of  machine  learning  techniques  that  find  hidden  structure  in  unlabeled  data. 

•  Cluster  analysis  is  an  example  of  unsupervised  learning  (in  contrast  to  supervised 
learning). 

Visualization 

•  Techniques  used  for  creating  images,  diagrams,  or  animations  to  communicate, 
understand,  and  improve  the  results  of  big  data  analyses. 

There  are  a  growing  number  of  technologies  used  to  aggregate,  manipulate, 
manage,  and  analyze  big  data.  Some  of  the  more  widely  used  technologies  used  to 
aggregate,  manage  and  analyze  big  data  are  found  in  Table  3. 


Table  3.  Big  Data  Analysis  Technologies  (from  Manyika  et  al,  201 1) 


TECHNOLOGY 

COMMENTS 

Big  Table 

•  Proprietary  distributed  database  system  built  on  the  Google  File  System. 

•  Inspiration  for  HBase. 

Business 

Intelligence 

•  A  type  of  application  software  designed  to  report,  analyze,  and  present  data. 

•  Often  used  to  read  data  previously  stored  in  a  data  warehouse  or  data  mart. 

•  Also  used  to  create  standard  reports  that  are  generated  on  a  periodic  basis, 
or  to  display  information  on  real-time  management  dashboards,  i.e., 
integrated  displays  of  metrics  that  measure  the  performance  of  a  system. 

Cassandra 

•  An  open  source  database  management  system  designed  to  handle  huge 
amounts  of  data  on  a  distributed  system. 

•  System  was  originally  developed  at  Facebook  and  is  now  managed  as  a 
project  of  the  Apache  Software  foundation. 

Cloud 

Computing 

•  A  computing  paradigm  in  which  highly  scalable  computing  resources,  often 
configured  as  a  distributed  system,  provided  as  a  service  through  a  network. 

Data  mart 

•  Subset  of  a  data  warehouse,  used  to  provide  data  to  users  usually  through 
business  intelligence  tools. 

Data 

warehouse 

•  Specialized  database  optimized  for  reporting,  often  used  for  storing  large 
amounts  of  structured  data. 

•  Data  uploaded  using  ETL  (extract,  transform,  and  load)  tools  from 
operational  data  stores,  and  reports  are  often  generated  using  business 
intelligence  tools. 

Distributed 

system 

•  Multiple  computers,  communicating  through  a  network,  used  to  solve  a 
common  computational  problem. 

•  Problem  is  divided  into  multiple  tasks,  each  of  which  is  solved  by  one  or 
more  computers  working  in  parallel. 

•  Benefits  of  distributed  systems  include  higher  performance  at  a  lower  cost 
(i.e.,  because  a  cluster  of  lower-end  computers  can  be  less  expensive  than 
a  single  higher-end  computer),  higher  reliability  (i.e.,  because  of  a  lack  of  a 
single  point  of  failure),  and  more  scalability  (i.e.,  because  increasing  the 
power  of  a  distributed  system  can  be  accomplished  by  simply  adding  more 
nodes  rather  than  completely  replacing  a  central  computer). 
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TECHNOLOGY 

COMMENTS 

Dynamo 

•  Proprietary  distributed  data  storage  system  developed  by  Amazon. 

Extract, 
transform,  and 
load  tools 

•  Software  tools  used  to  transfer  data  from  one  location  and  Integrate  Into  the 
data  set. 

Google  File 
System 

•  Proprietary  distributed  file  system  developed  by  Google;  part  of  the 

Inspiration  for  Hadoop. 31 

Hadoop 

•  Open  source  software  framework  for  processing  huge  datasets  on  certain 
kinds  of  problems  on  a  distributed  system.  Its  development  was  Inspired  by 
Google’s  MapReduce  and  Google  File  System.  It  was  originally  developed 
at  Yahoo!  and  is  now  managed  as  a  project  of  the  Apache  Software 
Foundation. 

HBase 

•  Open  source,  distributed,  non-relational  database  modeled  on  Google’s  Big 
Table. 

•  Originally  developed  by  Powerset  and  is  now  managed  as  a  project  of  the 
Apache  Software  foundation  as  part  of  the  Hadoop. 

MapReduce 

•  Software  framework  introduced  by  Google  for  processing  huge  datasets  on 
certain  kinds  of  problems  on  a  distributed  system. 

•  Also  implemented  in  Hadoop. 

Mashup 

•  Application  that  uses  and  combines  data  presentation  or  functionality  from 
two  or  more  sources  to  create  new  services. 

•  Applications  are  often  made  available  on  the  Web,  and  frequently  use  data 
accessed  through  open  application  programming  interfaces  or  from  open 
data  sources. 

Metadata 

•  Data  that  describes  the  content  and  context  of  data  files,  e.g.,  means  of 
creation,  purpose,  time  and  date  of  creation,  and  author. 

Non-relational 

database 

•  A  database  that  does  not  store  data  in  tables  (rows  and  columns). 

R 

•  Open  source  programming  language  and  software  environment  for 
statistical  computing  and  graphics. 

•  R  language  has  become  a  de  facto  standard  among  statisticians  for 
developing  statistical  software  and  is  widely  used  for  statistical  software 
development  and  data  analysis. 

Relational 

database 

•  Database  made  up  of  a  collection  of  tables  (relations),  i.e.,  data  are  stored 
in  rows  and  columns. 

•  Relational  database  management  systems  (RDBMS)  store  a  type  of 
structured  data. 

•  SQL  is  the  most  widely  used  language  for  managing  relational  databases. 

Seml-structured 

data 

•  Data  that  do  not  conform  to  fixed  fields  but  contain  tags  and  other  markers 
to  separate  data  elements. 

•  Examples  include  XML  or  HTML-tagged  text. 

SQL 

•  Originally  an  acronym  for  structured  query  language,  SQL  is  a  computer 
language  designed  for  managing  data  in  relational  databases. 

•  Technique  includes  the  ability  to  insert,  query,  update,  and  delete  data,  as 
well  as  manage  data  schema  (database  structures)  and  control  access  to 
data  in  the  database. 

Stream 

processing 

•  Technologies  designed  to  process  large  real-time  streams  of  event  data. 

•  Enables  applications  such  as  algorithmic  trading  in  financial  services,  RFID 
event  processing  applications,  fraud  detection,  process  monitoring,  and 
location-based  services  in  telecommunications. 

Structured  data 

•  Data  that  reside  in  fixed  fields. 

•  Examples  include  relational  databases  or  data  in  spreadsheets. 

Unstructured 

•  Data  that  do  not  reside  in  fixed  fields. 
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TECHNOLOGY 

COMMENTS 

data 

•  Examples  include  tree-form  text  (e.g.,  books,  articles,  body  of  e-mail 
messages),  untagged  audio,  image  and  video  data. 

Visualization 

•  Technologies  used  for  creating  images,  diagrams,  or  animations  to 
communicate  a  message  that  are  often  used  to  synthesize  the  results  of  big 
data  analyses. 

In  working  with  massive  amounts  of  data,  the  challenge  of  displaying  data  and 
visualization  methods  is  critical  in  finding  connections  and  relevance  among  millions  of 
parameters  and  variables  to  convey  linkages,  hypotheses,  metrics  and  project  future 
outcomes.  Taken  one  level  further.  Interactive  Visualization  moves  visualization  from 
static  spreadsheets  and  graphics  to  images  capable  of  drilling  down  for  more  detail  to 
immediately  change  how  data  are  presented  and  processed. 

Examples  of  visualization  methods  include: 

•  Bar  charts  are  commonly  used  for  comparing  the  quantities  of  different 
categories  or  groups. 


Figure  5.  Bar  Chart  (from  Choy,  Chawla,  &  Whitman,  2012) 


•  Box  plots  represent  a  distribution  of  data  values.  Displaying  five  statistics 
of  minimum,  lower  quartile,  median,  upper  quartile  and  the  maximum 
values  that  summarize  the  distribution  of  a  set  of  data.  Extreme  values  are 
represented  by  whiskers  extending  from  the  edges  of  the  box. 
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Figure  6.  Box  Plot  (from  Choy  et  al.,  2012) 


Bubble  plots  are  variations  of  a  scatter  plot  in  which  the  data  markers  are 
replaced  with  bubbles,  with  each  bubble  representing  an  observation  (or 
group  of  observations).  Useful  for  data  sets  with  many  values  or  when 
values  differ  by  orders  of  magnitude. 


Figure  7.  Bubble  Plot  (from  Choy  et  ah,  2012) 


Correlation  matrices,  combine  big  data  with  fast  response  times  to  identify 
quickly  which  variables  among  millions/billions  are  related.  They  also 
show  the  relationship  strength  between  variables. 


•  Cross-tabulation  charts  show  frequency  distributions  or  other  aggregate 
statistics  for  the  intersections  of  two  or  more  category  data  items. 
Crosstabs  enable  examination  of  data  for  intersections  of  hierarchy  nodes 
or  category  values. 


'  Visualiza-tion  1  ^  a  >c 


Figure  9.  Cross-Tabulation  Chart  (from  Choy  et  al.,  2012) 


•  Clustergrams  display  how  individual  members  of  a  dataset  are  assigned  to 
clusters  as  the  number  of  members  increases. 
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Figure  10.  Clustergram  (from  Manyika  et  al,  201 1) 

•  Geo  maps  display  data  as  a  bubble  plot  overlaid  on  a  geographie  map. 
Eaeh  bubble  is  loeated  either  at  the  center  of  a  geographic  region  or  at 
location  coordinates. 


Figure  11.  Geo  Map  (from  Choy  et  al.,  2012) 

•  Heat  maps  display  distribution  of  values  for  two  data  items  using  a  table 
with  colored  cells.  Colors  are  used  to  communicate  relationships  between 
data  values. 
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•  Histograms  are  variations  of  bar  charts  using  rectangles  to  show  the 
frequency  of  data  items  in  successive  numerical  intervals  of  equal  size. 
They  are  often  used  to  quickly  show  distribution  of  values  in  large  data 
sets. 


Figure  13.  Histogram  (from  Choy  et  ah,  2012) 


•  History  flow  charts  show  the  evolution  of  a  document  edited  by  multiple 
contributing  authors.  Time  appears  on  the  horizontal  axis,  while 
contributions  to  the  text  are  on  the  vertical  axis;  each  author  has  a  different 
color  code  and  the  vertical  length  of  a  bar  indicates  the  amount  of  text 
written  by  each  author. 


Figure  14.  History  Flow  (from  Manyika  et  ah,  201 1) 


•  Line  charts  show  the  relationship  of  one  variable  to  another  by  using  a  line 
that  connects  the  data  values.  They  are  most  often  used  to  track  changes  or 
trends  over  time. 
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Figure  15.  Line  Chart  (from  Choy  et  ah,  20 1 2) 


•  Pareto  charts  are  a  specialized  type  of  vertical  bar  chart  where  values  of 
the  dependent  variables  are  plotted  in  decreasing  order  of  frequency  from 
left  to  right.  They  are  used  to  quickly  identify  when  certain  issues  need 
attention. 


•  Scatter  plots  are  two-dimensional  plots  showing  joint  variation  of  two  (or 
three)  variables  from  a  group  of  table  rows.  They  are  useful  for  examining 
the  relationships,  or  correlations,  between  numeric  data  items. 
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Figure  17.  Scatter  Plot  (from  Choy  et  al.,  2012) 


•  Tag  clouds  are  a  weighted  visual  list  in  which  words  appearing  most 
frequently  are  larger  and  words  appearing  less  frequently,  smaller. 
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Figure  18.  Tag  Cloud  (from  Manyika  et  ah,  201 1) 


•  Tree  maps  are  a  variation  of  heat  maps  using  rectangles  (tiles)  to  represent 
data  components.  The  largest  rectangle  represents  the  dominant  division  of 
the  data  and  smaller  rectangles  represent  subdivisions. 


Figure  19.  Tree  Map  (from  Choy  et  ah,  2012) 
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D.  GOVERNMENT  SPENDING  ON  BIG  DATA 


The  federal  government  is  fueling  the  growth  of  big  data  spending  on  national 
security  and  military  applications.  According  to  Biometrics  Research  Group  (King, 
2013),  federal  agencies  spent  approximately  US$5  billion  on  big  data  resources  in  the 
2012  fiscal  year  and  estimates  annual  spending  will  grow  to  US$6  billion  in  2014.  By 
2017,  that  figure  will  reach  US$8  billion,  growing  at  a  compound  annual  growth  rate  of 
10  percent  as  shown  in  Figure  20. 


U.S.  Government  Spending  on 
Big  Data 

■  Revenue  (in  billions) 


Figure  20.  U.S.  Government  Spending  on  Big  Data  (from  King,  August,  2013) 

During  the  near  to  midterm.  Biometrics  Research  Group  (King,  2013)  predicts 
that  most  of  the  spending  will  be  on  military  applications  of  the  U.S.  government  with 
federal  agencies  pursuing  more  than  150  big  data  projects  (grants,  procurements,  grants 
or  related  activities).  The  agency  leading  big  data  research  is  the  U.S.  DOD,  with  more 
than  30  projects  and  in  particular,  the  Defense  Advanced  Research  Projects  Agency 
(DARPA)  with  nine  major  projects  (King,  2013). 
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In  a  recent  study  sponsored  by  the  company  EMC  (King,  2013)  that  surveyed  150 
U.S.  government  information  technology  (IT)  executives,  70%  of  respondents  stated  that 
big  data  will  be  critical  to  all  government  operations  within  five  years.  Big  data, 
according  to  the  survey,  has  the  potential  to  save  nearly  $500  billion  or  14%  of  agency 
budgets  across  the  federal  government  by  increasing  efficiency,  enabling  smarter 
decisions,  and  deepening  insight.  However,  only  31%  say  their  agency  has  an  adequate 
big  data  strategy  (King,  2013). 

Government  agencies  are  seeking  to  make  big  data  a  greater  part  of  its  mission. 
The  Department  of  Homeland  Security  (DHS)  posted  a  solicitation  July  24,  2013  (DHS, 
2013),  requesting  additional  information  from  industry  with  the  identification  of 
transformational  opportunities  to  improve  mission  and  operational  efficiencies  and  lower 
costs  through  advanced  analytic  automation  for  DHS  and  the  Homeland  Security 
Enterprise  (HSE).  The  request  for  information  (REI)  read,  “The  purpose  of  this  RFI  is  to 
ascertain  available  sources  to  provide  widely  used  big  data  infrastructure,  computing, 
storage,  analytics,  and  visualization  capabilities  that  are  based  on  open  source  or 
commonly  available  commercial  technologies  and  represent  technology  options  of  high 
value  to  the  future  of  homeland  security.” 

E.  BIG  DATA  PROJECTS  IN  GOVERNMENT 

In  2012,  the  Obama  administration  announced  the  Big  Data  Research  and 
Development  Initiative  to  help  solve  challenges  by  improving  the  ability  to  extract 
knowledge  and  insights  from  large  and  complex  collections  of  digital  data  (Office  of 
Science  and  Technology  Policy,  2012).  The  initiative’s  objective  is  to  analyze  big  data 
and  achieve  advances  in  several  sectors,  such  as  healthcare,  security,  the  environment, 
education  and  the  sciences.  Six  federal  departments  and  agencies  launched  the  initiative 
with  more  than  $200  million  in  commitments  that  promise  to  greatly  improve  the  tools 
and  techniques  needed  to  access,  organize,  and  glean  discoveries  from  huge  volumes  of 
digital  data. 

The  Big  Data  Research  and  Development  Initiative  was  created  to: 


24 


•  Advance  state-of-the-art  core  technologies  needed  to  collect,  store, 
preserve,  manage,  analyze,  and  share  huge  quantities  of  data. 

•  Harness  these  technologies  to  accelerate  the  pace  of  discovery  in  science 
and  engineering,  strengthen  our  national  security,  and  transform  teaching 
and  learning;  and 

•  Expand  the  workforce  needed  to  develop  and  use  big  data  technologies 
(Office  of  Science  and  Technology  Policy,  2012,  p.  1). 

The  DOD  announced  plans  to  invest  approximately  $250  million  annually  across 
the  military  departments  in  a  series  of  programs  that  will: 

•  Harness  and  utilize  massive  data  in  new  ways  and  bring  together  sensing, 
perception  and  decision  support  to  make  truly  autonomous  systems  that 
can  maneuver  and  make  decisions  on  their  own. 

•  Improve  situational  awareness  to  help  warfighters  and  analysts  and 
provide  increased  support  to  operations.  The  Department  is  seeking  a  100- 
fold  increase  in  the  ability  of  analysts  to  extract  information  from  texts  in 
any  language,  and  a  similar  increase  in  the  number  of  objects,  activities, 
and  events  that  an  analyst  can  observe  (Office  of  Science  and  Technology 
Policy,  2012,  pp.  2-3). 

According  to  King  (2013),  DOD  big  data  programs  include:  XDATA,  Cyber- 
Insider  Threat  (CINDER),  Anomaly  Detection  at  Multiple  Scales  (ADAMS),  Insight, 
Mind’s  Eye,  Machine  Reading,  Mission-oriented  Resilient  Clouds,  Programming 
Computation  on  Encrypted  Data  (PROCEED)  and  Video  and  Image  Retrieval  and 
Analysis  Tool  (VIRAT). 

XDATA  program  is  a  four  year,  $25  million  per-year  program  to  develop 
computational  techniques  and  software  tools  for  analyzing  large  volumes  of  data,  both 
semi- structured  (e.g.,  tabular,  relational,  categorical,  meta-data)  and  unstructured  (e.g., 
text  documents,  message  traffic).  Some  core  challenges  include  scalable  algorithms  for 
processing  imperfect  data  in  distributed  data  stores  and  effective  human-computer 
interaction  tools  that  are  rapidly  customizable  to  facilitate  visual  reasoning  for  diverse 
missions.  XDATA  envisions  open  source  software  toolkits  for  flexible  software 
development,  enabling  processing  of  large  volumes  of  data  for  use  in  targeted  defense 
applications  (King,  2013,  para.  13). 

The  Cyber-Insider  Threat  (CINDER)  program  seeks  to  develop  innovative 


approaches  to  detect  activities  consistent  with  cyber  espionage  in  military  computer 
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networks.  CINDER  will  apply  various  models  of  adversary  missions  to  normal  activity 
on  internal  networks  as  a  method  to  expose  hidden  operations.  The  program  also  intends 
to  increase  the  accuracy,  rate  and  speed  with  which  cyber  threats  are  detected  (King, 
2013,  para.  6). 

The  Anomaly  Detection  at  Multiple  Scales  (ADAMS)  program  addresses  the 
issue  of  anomaly  detection  and  characterization  in  massive  data  sets.  Data  anomalies  are 
intended  to  cue  collection  of  additional,  actionable  information  in  a  wide  variety  of  real- 
world  contexts.  Initially,  ADAMS  will  focus  on  insider  threat  detection,  in  which 
anomalous  actions  by  an  individual  are  detected  against  a  background  of  routine  network 
activity  (King,  2013,  para.  5). 

The  Insight  program  addresses  key  shortfalls  in  current  intelligence,  surveillance 
and  reconnaissance  systems.  Automation  and  integrated  human-machine  reasoning 
enable  operators  to  analyze  greater  numbers  of  potential  threats  ahead  of  time- sensitive 
situations.  This  program  seeks  to  develop  a  resource  management  system  which 
automatically  identifies  threat  networks  and  irregular  warfare  operations  by  the  analysis 
of  information  from  imaging  and  non-imaging  sensors  and  other  sources  (King,  2013, 
para.  7). 

The  Mind’s  Eye  program  seeks  to  develop  a  capability  for  visual  intelligence  in 
machines.  Unlike  the  traditional  study  of  machine  vision  where  progress  has  been  made 
in  recognizing  a  wide  range  of  objects  and  their  properties  or  the  nouns  in  the  description 
of  a  scene.  Mind's  Eye  seeks  to  add  the  perceptual  and  cognitive  underpinnings  needed 
for  recognizing  and  reasoning  about  the  verbs  in  those  scenes.  Collectively,  these 
technologies  could  enable  a  more  complete  visual  narrative  (King,  2013,  para.  9). 

The  Machine  Reading  program  seeks  to  realize  artificial  intelligence  applications 
by  developing  learning  systems  that  process  natural  text  and  insert  the  resulting  semantic 
representation  into  a  knowledge  base  rather  than  relying  on  expensive  and  time- 
consuming  current  processes  for  knowledge  representation  that  require  expert  and 
associated-knowledge  engineers  to  hand  craft  information  (King,  2013,  para.  8). 
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The  Mission-oriented  Resilient  Clouds  program  aims  to  address  security 
challenges  inherent  in  cloud  computing  by  developing  technologies  to  detect,  diagnose 
and  respond  to  attacks  (King,  2013,  para.  10). 

The  Programming  Computation  on  Encrypted  Data  (PROCEED)  research  effort 
targets  a  major  challenge  for  information  security  in  cloud-computing  environments  by 
developing  practical  methods  and  associated  modem  programming  languages  for 
computation  on  data  that  remains  encrypted  the  entire  time  it  is  in  use.  Interception  by  an 
adversary  would  be  more  difficult  if  users  have  the  ability  to  manipulate  encrypted  data 
without  first  decrypting  (King,  2013,  para.  1 1). 

The  Video  and  Image  Retrieval  and  Analysis  Tool  (VIRAT)  program  aims  to 
develop  a  system  to  provide  military  imagery  analysts  with  the  capability  to  exploit  the 
vast  amount  of  overhead  video  content  being  collected.  VIRAT  will  enable  analysts  to 
establish  alerts  for  activities  and  events  of  interest  as  they  occur  if  it  is  successful.  Tools 
will  also  be  developed  to  enable  analysts  to  rapidly  retrieve,  with  high  precision  and 
recall,  video  content  from  extremely  large  video  libraries  (King,  2013,  para.  12). 

F.  GOVERNMENT  BIG  DATA  CASE  STUDIES 

Government  agencies  have  implemented  big  data  projects  to  transform  agencies’ 
processes  and  procedures.  The  U.S.  Army,  for  example,  is  already  leveraging  big  data 
technologies  in  conjunction  with  cloud  computing  (Cruz,  2013).  Started  in  April  2009, 
the  U.S.  Army’s  Big  Data  Cloud  program  extends  to  forward  operating  bases,  which  can 
double  as  local  nodes  that  collect  data  from  various  sources.  The  private  cloud,  which 
went  live  in  March,  2011,  conveys  the  latest  intelligence  information  to  US  troops  in 
Afghanistan  in  real  or  near-real  time  (Cruz,  2013). 

The  National  Archive  and  Records  Administration  (NARA)  challenge  is  to 
digitize  a  huge  volume  of  unstructured  data  to  provide  quick  access  while  maintaining  the 
data  in  both  classified  and  unclassified  environments  (TechAmerica  Eoundation,  2011). 
NARA  is  charged  with  providing  the  Electronic  Records  Archive  (ERA)  and  online, 
public  access  systems  for  U.S.  records  and  documentary  heritage.  NARA  manages 
approximately  142  terabytes  of  information,  consisting  of  more  than  7  billion  objects, 

27 


incorporating  records  from  across  the  federal  agencies,  Congress  and  several  presidential 
libraries  in  January  2012.  There  are  more  than  350  million  annual  hits  on  its  website.  In 
addition  managing  the  ERA,  NARA  must  digitize  more  than  4  million  cubic  feet  of 
traditional  archival  holdings,  including  about  400  million  pages  of  classified  information 
scheduled  for  declassification,  pending  review  with  the  intelligence  community 
(TechAmerica  Foundation,  2011). 

NARA  used  big  data  tools  to  address  those  challenges.  In  conjunction  with 
traditional  data  capture,  digitizing,  and  storage  capabilities,  advanced  big  data  capabilities 
were  used  for  search,  retrieval,  and  presentation,  all  while  supporting  strict  security 
guidelines.  Faster  result  ingestion  and  categorization  of  documents,  improved  end  user 
experience  and  dramatically  reduced  storage  costs  were  the  results  (TechAmerica 
Foundation,  2011).  Other  big  data  cases  involving  government  agencies  are  summarized 
in  Table  4. 
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Table  4.  High  Level  Summary  of  Case  Studies 
(from  TechAmerica  Foundation,  2012) 


Agency/Org/Co. 
Big  Data  Project 
Name 

Underpinning 

Technoiogies 

Big  Data 
Metrics 

initiai  Big 
Data  Entry 
Point 

Public/ 

User 

Benefits 

National  Archive  and 
Records 
Administration 
(NARA)  Electronics 
Records  Archive 

Metadata,  Submission, 
Access,  Repository, 

Search  and  Taxonomy 
applications  for  storage 
and  archival  systems 

Petabytes, 

Terabytes/sec, 

Semi-structured 

•  Warehouse 
Optimization 

•  Distributed 

Info  Mgt 

Provides  ERA 
and  Online 
Public  Access 
systems  for 

US  records  & 
documentary 
heritage 

National  Aeronautics 
and  Space 
Administration 
(NASA)  Human 

Space  Flight 

Imagery 

Metadata,  Archival, 

Search  and  Taxonomy 
applications  for  tape 
library  systems,  GOTS 

Petabytes, 

Terabytes/sec, 

Semi-structured 

Warehouse 

Optimization 

Provide 
industry  and 
the  public  with 
iconic  and 
historic  human 
spaceflight 
imagery  for 
scientific 
discovery, 
education  and 
entertainment 

National  Oceanic 
and  Atmospheric 
Administration 
(NOAA)  National 
Weather  Service 

HPC  modeling,  data  from 
satellites,  ships,  aircraft 
and  deployed  sensors 

Petabytes, 

Terabytes/sec, 

Semi- 

structured, 

ExaFLOPS, 

PetaFLOPS 

Streaming  Data 
&  Analytics, 
Warehouse 
Optimization, 
Distributed  Info 
Mgt 

Provide 
weather, 
water,  climate 
data,  and 
forecasts  and 
warnings  for 
the  protection 
of  life  and 
property  and 
enhancement 
of  the  national 
economy. 

Internal  Revenue 
Service  (IRS) 
Compliance  Data 
Warehouse 

Columnar  database 
architecture,  multiple 
analytics  applications, 
descriptive,  exploratory, 
and  predictive  analysis 

Petabytes 

Streaming  Data 
&  Analytics, 
Warehouse 
Optimization, 
Distributed  Info 
Mgt 

Provide 
taxpayers  top 
quality  service 
by  helping 
them  to 
understand 
and  meet  their 
tax 

responsibilitie 
s  and  enforce 
the  law  with 
integrity  and 
fairness. 

Centers  for 

Medicare  & 

Medicaid  Services 
(CMS)  Medical 
Records  Analytics 

Columnar  and  NoSQL 
databases,  Hadoop 
being  looked  at,  EHR  on 
the  front  end,  with  legacy 
structured  database 
systems  (including  DB2 
and  COBOL) 

Petabytes, 

Terabytes/day 

Streaming  Data 
&  Analytics, 
Warehouse 
Optimization, 
Distributed  Info 
Mgt 

Protect  the 
health  of  all 
Americans 
and  ensure 
compliant 
processing  of 
insurance 
claims 

G.  LESSONS  LEARNED 

It  is  useful  to  better  understand  this  somewhat  ambiguous  concept,  big  data,  by 
taking  advantage  of  lessons  learned  by  other  organizations  dealing  with  similar  problems. 
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The  TechAmerica  Foundation  Big  Data  Commission  released  a  study  in  October  2012  on 
how  big  data  can  move  beyond  the  tidal  wave  of  data  and  transform  government.  The 
Commission’s  mandate  was  to  demystify  the  term  big  data  by  defining  its  characteristics, 
describing  the  key  business  outcomes  it  will  serve,  and  providing  a  framework  for  policy 
discussion.  Its  goal  was  to  provide  guidance  to  federal  government’s  senior  policy  and 
decision  makers. 

The  Commission  identified  a  number  of  lessons  learned  from  early  government 
big  data  initiatives  (TechAmerica  Foundation,  2012): 

•  The  path  towards  becoming  big  data  “capable”  will  be  iterative  and 
cyclical. 

•  Successful  big  data  initiatives  seem  to  begin  with  a  burning  business  or 
mission  requirement  that  government  leaders  are  unable  to  address  with 
traditional  approaches. 

•  Successful  big  data  initiatives  commonly  start  with  a  specific  and 
narrowly  defined  business  or  mission  requirement,  and  not  a  plan  to 
deploy  a  new  and  universal  technical  platform  to  support  perceived  future 
requirements. 

•  Successful  initiatives  seek  to  address  the  initial  set  of  use  cases  by 
augmenting  current  IT  investments,  but  do  so  with  an  eye  to  leveraging 
these  investments  for  inevitable  expansion  to  support  far  wider  use  cases 
in  subsequent  phases  of  deployment. 

•  Once  an  initial  set  of  business  requirements  has  been  identified  and 
defined,  the  leaders  of  successful  initiatives  then  assess  the  technical 
requirements,  identify  gaps  in  their  current  capabilities,  and  then  plan  the 
investments  to  close  those  gaps. 

•  Successful  initiatives  tend  to  follow  three  “Patterns  of  Deployment” 
underpinned  by  the  selection  of  one  big  data  “entry  point”  that 
corresponds  to  one  of  the  key  characteristics  of  big  data  -  volume,  variety 
and  velocity. 

•  After  completing  their  initial  deployments,  government  leaders  typically 
expand  to  adjacent  use  cases,  building  out  a  more  robust  and  unified  set  of 
core  technical  capabilities.  These  capabilities  include  the  ability  to  analyze 
streaming  data  in  real  time,  the  use  of  Hadoop  or  Hadoop-like 
technologies  to  tap  huge,  distributed  data  sources,  and  the  adoption  of 
advanced  data  warehousing  and  data  mining  software  (TechAmerica 
Foundation,  2012,  p.  7). 
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The  Commission  made  the  following  recommendations  for  government  agency 
leaders  to  adopt  when  implementing  big  data  solutions: 

•  Understand  the  “Art  of  the  Possible”  by  reviewing  case  studies  of  prior 
implementations  to  understand  practical  examples. 

•  Identify  two  to  four  key  business  or  mission  requirements  that  big  data  can 
address  for  the  government  agency,  and  define  and  develop  underpinning 
use  cases  that  would  create  value  for  both  the  agency  and  the  public. 

•  Take  inventory  of  “data  assets.”  Explore  the  data  available  both  within  the 
agency  enterprise  and  across  the  government  ecosystem  within  the  context 
of  the  business  requirements  and  the  use  cases. 

•  Assess  current  capabilities  and  architecture  against  what  is  required  to 
support  goals,  and  select  the  deployment  entry  point  that  best  fits  your  big 
data  challenge,  whether  it  is  volume,  variety  or  velocity. 

•  Explore  which  data  assets  can  be  made  open  and  available  to  the  public  to 
help  spur  innovation  outside  the  agency  (TechAmerica  Eoundation,  2012, 

p.  8). 

H.  BIG  DATA  IN  THE  U.S.  NAVY 

The  U.S.  Naval  Air  Systems  Command  (NAVAIR)  has  optimized  its  resources 
with  big  data.  NAVAIR  implemented  the  Decision  Knowledge  Programming  for 
Logistics  Analysis  and  Technical  Evaluation  (DECKPLATE)  system  to  centralize  and 
streamline  management  of  aircraft  fleet  and  aircraft  carriers  deployed  around  the  world 
(Sverdlik,  2012).  DECKPLATE  is  used  to  manage  fleet  resources  during  both  military 
and  humanitarian  missions.  When  the  Eukushima  Daiichi  nuclear  power  plant  was 
leaking  radiation,  DECKPLATE  was  used  to  determine  readiness  of  the  fleet  operating  in 
the  area.  It  also  provided  real-time  data  on  the  danger  of  radiation  exposure  by  the  Navy’s 
assets  during  this  time  (Sverdlik,  2012). 

DECKPLATE  provides: 

•  Enterprise-wide  Visibility.  DECKPLATE  uses  about  23  years  of  trend 
analysis  of  aircraft  readiness,  checking  data  on  areas  such  as  aircraft 
maintenance,  flight  usage  and  inventory,  configuration  baseline 
management,  engine  total  asset  visibility,  technical  directives  and  supply 
cost. 

•  Daily  Reporting.  Daily  readiness  reporting  is  provided  with  messages 
going  out  every  day  from  an  aircraft  carrier  deployed  at  sea  concerning 
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aircraft  status.  In  2004,  these  reports  would  be  correlated  on  a  monthly 
basis,  put  on  a  DVD  and  sent  to  commanders  with  the  readiness  status. 

•  Constant  Process  Optimization.  DECKPLATE  provides  on-going 
improvements  of  its  processes.  It  can  provide  data  to  address  problems 
pro-actively,  before  they  occur,  which  the  traditional  reporting  process  did 
not  allow  for. 

•  Changing  Eogistics  Philosophy.  Historically,  the  military  wanted  100%  of 
its  assets  up  100%  of  the  time  and  that  required  expenditures  to  fix  things 
that  weren’t  really  necessary.  With  DECKPEATE,  an  initiative  was 
created  to  optimize  the  logistics  process  to  having  the  right  assets  with  the 
right  configuration  in  the  right  place  at  the  right  time  (Sverdlik,  2012, 
Enterprise-wide  visibility,  para.  6). 

The  next  phase  for  DECKPEATE  is  binning  in  which  data  would  be  evaluated  on 
a  more  granular  level  (Sverdlik,  2012).  In  the  binning  project,  a  history  of  some  200 
million  maintenance  actions  would  be  broken  down  into  the  exact  types  of  maintenance 
actions  required.  The  historical  maintenance  actions  would  then  be  further  broken  down 
into  every  15  minutes.  Was  the  aircraft  awaiting  maintenance  during  that  time  or  was  it 
awaiting  supply?  The  final  objective  of  identifying  exactly  how  and  where  time  was 
spent  on  the  aircraft  during  maintenance  period  requires  a  massive  amount  of  data  to  be 
collected  and  analyzed  over  a  five-year  period  on  approximately  5,000  aircraft  (Sverdlik, 
2012). 
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III.  SHIP  MAINTENANCE  VIGNETTES 


A.  INTRODUCTION 

Maintenance  is  crucial  to  maintaining  the  Navy’s  fleet  readiness  and  ensuring  that 
the  fleet  reaches  its  expected  service  life.  This  section  provides  three  aspects  of  ship 
maintenance  through  vignettes  to  provide  a  framework  for  understanding  these  types  of 
activities  within  the  Navy.  It  begins  with  a  general  discussion  on  maintenance  and 
modernization  budgets,  and  then  specific  ship  case  examples  are  provided. 

B.  MAINTENANCE  AND  MODERNIZATION  SPENDING 

Maintenance  and  modernization  is  essential  to  derive  full  benefits  of  DOD  assets 
and  more  importantly,  enables  the  U.S  to  respond  quickly  to  security  challenges  and  offer 
humanitarian  assistance  around  the  world.  In  FY20I0,  the  DOD  spent  approximately 
$83.7  billion  in  FY2010  to  maintain  strategic  material  readiness  for  13,900  aircraft,  800 
strategic  missiles,  350,000  ground  combat  and  tactical  vehicles,  283  ships,  and  myriad 
other  DOD  weapon  systems  (Office  of  the  Assistant  Secretary  of  Defense  for  Logistics  & 
Material  Readiness  [OASD(L&MR)],  2011).  Figure  21  shows  the  systems  supported  by 
the  DOD.  Maintenance  was  provided  through  the  efforts  of  approximately  657,000 
military  and  civilian  maintainers  and  thousands  of  commercial  firms. 

39,000 

Combat  Vehicles  800  Strategic  Missiles 


283  Ships  1 3,900  Aircraft 

+  311 ,000  tactical  vehicles 
+  Communications/electronics  equipment 
+  Support  equipment 
+  Other  svstems 

Figure  21.  Systems  Supported  by  DOD  Maintenance  (from  OASD[L&MR],  201 1) 
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Performed  at  several  levels,  DOD  material  maintenance  ranges  in  complexity 
from  daily  system  inspections  to  rapid  removal  and  replacement  of  components  to 
complete  overhauls  or  rebuilds  of  a  weapon  system.  The  three  levels  of  maintenance  are 
as  follows:  depot- level  maintenance  for  the  most  complex  and  extensive  work; 
intermediate-level  maintenance  for  less  complex  maintenance  activities  performed  by 
operating  unit  back-shops,  base-wide  activities,  or  consolidated  regional  facilities;  and 
field-level  maintenance,  a  combination  of  organizational  depot  and  intermediate  levels 
(OASD[L&MR],2011). 

In  early  2011,  the  DOD  operated  17  major  depot  activities  and  expended  more 
than  98  million  direct  labor  hours  (DLHs)  annually  (Avdellas,  Berry,  Disano,  Oaks,  & 
Wingrove,  2011).  Property,  plant,  and  equipment  of  DOD  depots  were  valued  at  more 
than  $48  billion  with  an  infrastructure  consisting  of  more  than  5,600  buildings  and 
structures  (Avdellas  et  ah,  2011). 

To  maintain  readiness  and  ensure  that  the  fleet  reaches  its  expected  service  life, 
the  Navy  spent  $8.5  billion  on  ship  maintenance  in  FY2011.  Figure  22  shows  the  Navy’s 
maintenance  budget. 


(Dollars  in  Millions) 

FY2011 

FY2012 

FY2013 

Active  Forces 

Ship  Maintenance 

$4,726 

$4,533 

$5,090 

Depot  Operations  Support 

$1,326 

$1,296 

$1,315 

Baseline  Ship  Maintenance  (0&M,N) 

S6,052 

$5,829 

56,405 

Overseas  Contingency  Operations 

$2,484 

$1,493 

$1,310 

Total  Ship  Maintenance  (0&M,N) 

S8,536 

$7,322 

57,715 

Percentage  of  Projection  Funded 

100% 

97% 

100% 

Annual  Deferred  Maintenance 

$0 

$217 

$0 

CVN  Refueling  CK’^erhauls  (SCN) 

1,664 

530 

1,683 

%  of  SCN  Estimates  Funded 

100% 

100% 

100% 

Figure  22.  U.S.  Navy  Ship  Maintenance  Costs  (from  Department  of  the  Navy,  2012) 
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c. 


MAINTENANCE  VIGNETTES 


Each  of  the  three  vignettes  describes  an  aspect  of  ship  maintenance  work:  new 
work  (NW),  deferred  maintenance  (DM),  and  modernizations.  Although  there  is  another 
category,  original  work,  maintenance  for  which  planning  has  been  completed  and  is 
included  in  the  maintenance  package  before  the  actual  maintenance  begins,  this  section 
focuses  on  NW,  DM  and  modernizations. 

NW  is  maintenance  added  to  a  specific  ship’s  availability  after  planning  has  been 
completed  (i.e.,  not  part  of  the  original  maintenance  package).  NW  can  result  from 
discrepancies  that  have  not  yet  been  discovered  or  from  work  which  was  not  yet  added  to 
the  availability  work  package  until  after  planning  was  complete.  DM  refers  to  the  status 
of  maintenance  rather  than  the  time  of  its  inclusion  in  the  maintenance  package  and  may 
be  either  original  work  or  NW.  DM  is  work  that  is  rescheduled  to  be  completed  later  in 
the  current  availability  or  as  part  of  a  future  maintenance  period.  Modernizations  (or 
mods)  are  system  upgrades.  A  modernization  can  range  in  scope  from  a  short-term 
software  upgrade  to  a  long-term  ship  infrastructure  remodeling.  Generally,  the  planning 
for  all  the  modernization  work  to  take  place  is  completed  before  the  availability  begins 
and  is  therefore  classified  as  original  work.  However,  in  the  modernization  vignette 
below,  two  cases  demonstrate  that  situations  can  arise  which  require  modernization  work 
to  become  NW.  Figure  23  shows  the  relationships  among  the  different  categories. 


Ship  Maintenance  Work  Classifications 


Figure  23.  Ship  Maintenance  Work  Classifications 
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The  three  ships  used  in  the  vignettes,  under  the  cognizance  of  Norfolk  Ship 
Support  Activity  (NSSA),  are  the  United  States  Ship  (USS)  Wasp  (LHD-1),  the  USS 
Bataan  (LHD-5),  and  the  USS  Iwo  Jima  (LHD-7).  First,  LHD-7  will  be  a  case  study  to 
describe  NW.  Second,  to  depict  DM,  both  LHD-5  and  LHD-7  will  be  examples.  Finally, 
LHD-1  and  LHD-7  are  used  to  illustrate  modernizations,  as  shown  in  Figure  24. 


Vignette 


New  Work 

Deferred 

Maintenance 

Modernizations 

USS  Wasp  (LHD-1) 

X 

USS  Bataan  (LHD-5) 

X 

USS  Iwo  Jima  (LHD-7) 

X 

X 

X 

Figure  24.  Vignette  Overview 


The  three  vignettes  that  follow  were  derived  from  two  phone  conversations  with 
David  J.  Furey,  a  civilian  employee  of  NSSA,  on  9  and  11  September,  2013. 

1.  New  Work  Vignette:  USS  Iwo  Jima 

The  USS  Iwo  Jima  is  an  example  requiring  NW,  DM  and  modernization.  In 
addition,  this  case  examines  NW  and  how  complications  from  NW  can  impact  schedules. 
In  this  vignette,  the  focus  is  on  the  rudder  and  the  bilge.  The  rudder,  a  critical  portion  of 
the  ship’s  steering  system,  caused  a  schedule  extension  due  to  degradation  that  was  not 
readily  apparent.  All  appropriate  assessments,  checks,  and  leakage  tests  were  conducted 
by  maintenance  technicians  and  the  results  indicated  the  rudder  was  in  good  condition. 
All  the  tests  associated  with  the  rudder  were  within  specified  parameters  and  the  rudder 
passed  the  preliminary  inspection.  Unfortunately,  bearing  clearance  testing,  tests  which 
analyze  rudder  performance  over  the  entire  range  of  operation  (full  left  to  full  right), 
exposed  inconsistencies  prior  testing  did  not  reveal.  Results  from  the  test  were  irregular 
and  upon  examination  of  the  rudder  bearings,  metal  debris  and  rust  were  discovered.  The 
decision  was  ultimately  made  to  remove  and  replace  the  rudder  by  NSSA,  which  resulted 
in  the  availability  schedule  being  extended  by  14  days. 
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NW  was  also  required  on  the  bilge  of  the  USS  Iwo  Jima.  As  part  of  the  entire 
availability,  high  pressure  washing  was  required  in  the  bilge.  While  performing  this 
evolution,  fuel  piping  was  damaged  and  a  leak  developed.  Ship’s  force,  a  term  which 
describes  the  active  duty  sailors  onboard  the  ship,  repaired  the  damage  by  using  a  soft 
patch.  A  soft  patch  is  a  temporary  repair  method  for  low  pressure  piping.  However, 
NSSA  was  constrained  by  more  restrictive  requirements  and  was  required  to  replace  the 
faulty  piping.  To  determine  the  extent  of  the  damage,  ultrasonic  testing  (UT)  was  used, 
which  uses  sound  wave  properties  to  determine  the  amount  of  pipe  wall  thickness 
remaining.  Less  than  50%  remaining  pipe  wall  requires  the  NSSA  to  replace  the  pipe.  UT 
was  performed  and  revealed  40  ft.  of  fuel  piping,  and  an  additional  20  ft.  of  oily  waste 
piping,  that  required  replacement.  The  availability  schedule  was  extended  by  40  days  to 
replace  the  identified  piping. 

2.  Deferred  Maintenance  Vignette:  USS  Bataan  and  USS  Iwo  Jima 

In  these  vignettes,  the  USS  Bataan’s  example  is  related  to  cost-cutting  while  the 
USS  Iwo  Jima’s  example  is  related  to  prioritization.  The  overall  magnitude  of  work  to  be 
accomplished  during  the  USS  Bataan  availability  made  it  a  target  of  cost-cutting  during 
shrinking  fiscal  budgets  in  2012.  A  common  item  to  be  deferred  is  paintwork  and  the 
USS  Bataan  was  not  an  exception.  Much  of  the  tank  paintwork  was  deferred  from  the 
2012  availability  to  the  2015  availability  as  a  result  of  fiscal  cutbacks. 

The  USS  Iwo  Jima  also  experienced  DM,  but  the  maintenance  was  deferred 
because  higher  priorities  required  the  ship  to  be  waterborne.  Specifically,  the  7-K-O-W 
tank,  the  forward  feed  tank  for  the  ship’s  ballast  system,  was  due  for  preservation  and 
required  the  ship  to  remain  in  drydock.  The  tank  had  not  been  opened  since 
commissioning  as  this  was  the  ship’s  first  drydock  availability.  Inspection  revealed  the 
tank  to  be  in  Tank  Condition  4,  or  rather  that  a  profound  failure  had  been  discovered.  UT 
showed  that  no  more  than  17%  surface  wastage  had  occurred  and  therefore  the  tank  had 
become  a  candidate  for  deferral.  Higher  priority  maintenance  necessitated  that  the  ship  be 
waterborne,  so  the  drydock  was  flooded  and  the  7-K-O-W  tank  preservation  was 
deferred. 
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While  the  effeet  on  a  ship’s  availability  sehedule  of  the  addition  of  NW  ean  be 
directly  measured,  the  consequence  of  deferring  maintenance  is  a  matter  of  risk.  The  USS 
Iwo  Jima  added  NW  to  its  availability  and  incurred  schedule  delays,  or  lost  operating 
days  (LOD);  14  days  were  attributed  to  work  on  the  rudder  and  40  days  to  the 
replacement  of  pipe.  In  both  cases,  the  impact  can  be  easily  measured. 

As  for  DM,  the  impact  can  range  from  minimal  to  substantial.  For  instance,  the 
tank  paintwork  for  the  USS  Bataan  was  deferred  until  the  next  planned  availability  in 
2015.  The  paintwork  would  have  cost  a  certain  dollar  amount  in  2012  and  would  have 
provided  the  tank  a  level  of  preservation  protection.  In  2015,  the  paintwork  will  cost 
more  not  only  because  of  inflation  and  the  degradation  of  the  paint  associated  with  time, 
but  also  because  corrosion  will  have  developed  at  a  higher  rate  than  it  would  have  with  a 
fresh  application  of  paint.  The  difference  between  the  cost  of  paintwork  in  2015  versus 
the  cost  in  2012  (including  corrosion  correction)  is  the  impact  of  this  example  of  DM  and 
would  be  comparatively  minimal.  However,  the  possibility  of  a  larger  effect  exists. 
Perhaps  the  development  and  growth  of  corrosion  on  the  7-K-O-W  tank  is 
underestimated.  If  the  corrosion  progresses  significantly  faster,  then  the  likelihood  of 
structural  failure  increases.  Should  the  structural  failure  occur  outside  the  maintenance 
environment  of  the  shipyard,  then  the  impact  would  be  far  greater  and  the  costs 
associated  with  unscheduled  maintenance  much  higher.  The  decision  to  defer  the 
preservation  of  the  tank  must  consider  both  the  likelihood  and  severity  of  all  the  possible 
outcomes.  In  other  words,  the  decision  maker  must  consider  all  the  associated  risks 
before  deferring  maintenance. 

3.  Modernizations  Vignettes:  USS  Iwo  Jima  and  USS  Wasp 

Modernizations  have  the  most  potential  to  impact  schedule  from  the  three 
classifications  of  shipyard  maintenance  examined  in  this  section.  In  the  cases  of  the  USS 
Iwo  Jima  and  USS  Wasp,  modernizations  may  affect  the  timetable  because  not  all  the 
required  drawings  had  been  completed  prior  to  the  start  of  work.  For  the  USS  Iwo  Jima,  a 
single  modernization  will  be  presented,  whereas  the  USS  Wasp  serves  as  a  more  general 
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example.  However,  a  brief  overview  of  the  shipyard  planning  evolution  will  be  presented 
first  to  explain  the  importance  of  timely  drawings. 

Before  the  shipyard  period  starts,  the  plan  for  a  scheduled  availability  must  be 
completed.  To  complete  the  plan  for  an  availability,  NAVSEA  must  approve  the 
contractor  provided  estimate  (Department  of  the  Navy  [DON],  2012b).  To  generate  the 
estimate,  however,  the  contractor  must  review  all  the  drawings  (first-tier  and  second-tier) 
associated  with  the  work  to  be  performed  (D.  Furey,  Personal  Communications,  9  &  11 
September,  2013).  First-tier  drawings  are  the  main  focus  of  the  modernization,  whereas 
second-tier  drawings  involve  infrastructure  and  subsystems  related  to  the  work.  For  a 
particular  modernization,  if  all  the  drawings  are  not  completed,  then  the  contractor  cannot 
create  the  estimate  and  an  approved  plan  will  not  exist.  In  addition,  availabilities  must 
sometimes  commence  on  a  partial  solution,  otherwise  all  work  would  be  completed  late. 
In  the  situation  without  an  approved  plan,  the  project  completion  date  (PCD)  has  a  larger 
margin  of  error  and  schedule  changes  are  more  likely  to  occur. 

This  was  the  case  with  the  CANES  installation  in  the  USS  Iwo  lima  availability. 
CANES,  or  Consolidated  Afloat  Networks  and  Enterprise  Services,  as  its  name  implies, 
is  a  program  created  to  consolidate  many  networks  and  services  aboard  ships  into  a  single 
information  technology  system.  Although  not  all  of  the  drawings  were  received,  the 
maintenance  period  started  anyway.  There  was  other  work  to  perform;  CANES  was  not 
the  only  reason  for  the  USS  Iwo  Jima  to  visit  the  shipyard.  As  drawings  for  CANES  were 
completed,  they  were  then  provided  to  the  contractor.  However,  the  plan  for  CANES 
could  not  be  approved  until  all  the  drawings  were  received,  the  contractor  generated  the 
estimate,  and  NAVSEA  accepted  the  plan. 

In  the  case  of  the  USS  Wasp,  the  estimated  modernization  cost  was  extremely 
high  at  $250  million  to$300  million.  The  high  cost  was  partially  due  to  modernizations 
needed  to  accommodate  the  F-35  Joint  Strike  Fighter  (JSF)  since  the  USS  Wasp  was  to 
be  the  first  ship  to  test  the  JSF  and  part  of  the  flight  deck  had  to  be  strengthened.  Not  only 
was  the  structural  reinforcement  of  the  after  flight  deck  a  large  package,  but  the  ship  was 
also  undergoing  many  other  modernizations.  Unfortunately,  the  USS  Wasp  also  started 
its  availability  without  a  complete  plan.  Twenty  modernization  packages  were  not 
included  in  the  plan,  including  the  structural  reinforcement  of  the  after  flight  deck, 

because  the  drawings  had  not  yet  been  delivered. 
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In  addition,  NSSA  erroneously  included  one  large  work  item  in  the  plan  for  which 
second-tier  drawings  had  not  yet  been  received.  The  contractor  brought  the  discrepancy 
to  their  attention  explaining  that  they,  the  contractor,  would  not  be  able  to  complete  an 
estimate  before  the  plan  was  completed  (also  known  as  100%  lock).  NSSA  had  two 
options,  either  extend  the  lock  or  pull  the  work  item  out  and  add  back  in  later  as  NW. 
They  chose  the  latter. 

In  both  these  vignettes,  modernizations  had  significant  potential  to  severely  affect 
the  scheduled  PCD  because  the  drawings  were  not  completed.  Two  questions  arise 
associated  with  the  implications  of  missing  PCD  on  ship  maintenance  costs  and  are  listed 
below: 

•  Is  there  a  cost  premium  to  new  work?  In  other  words,  do  costs  increase 
because  a  modernization  was  added  after  100%  lock? 

•  Are  LODs  caused  by  planning  or  scope?  In  other  words,  is  it  the  planning 
process  or  the  scope  of  work  which  is  to  blame  for  missing  PCD? 

D.  SUMMARY 

The  U.S.  Navy  ship  maintenance  process  is  already  an  enormously  expensive 
endeavor.  Situations  which  result  in  NW  or  DM  only  add  cost  to  the  process  in  the  form 
of  budget  and  schedule  overruns.  The  information  regarding  those  overruns  is  available 
to  decision  makers,  but  only  in  cumbersome,  static  spreadsheets  and  in  very  large 
quantities.  Executive  level  ship  maintenance  decision  makers  need  a  way  to  easily  and 
intuitively  understand  the  information  available  to  them  so  that  decisions  can  be  made 
which  would  reduce  the  occurrence  of  NW  and  DM.  Ship  maintenance  executives  require 
a  big  data  technology  that  would  provide  a  clear  understanding  of  the  relationships 
among  all  the  variables,  specifically  those  which  cause  increased  costs  and  schedule 
overruns.  In  the  next  section  of  this  report,  software  is  used  to  analyze  the  historical 
maintenance  information  of  a  selected  group  of  U.S.  Navy  ships.  It  will  show  how  big 
data  technology  could  be  used  to  provide  decision  makers  with  a  clear,  intuitive 
visualization  of  ship  maintenance  costs. 
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IV.  SHIP  MAINTENANCE  SIMULATIONS 


A.  OVERVIEW 

A  team  from  the  Naval  Postgraduate  School  (NPS)  was  tasked  by  PEO  SHIPS  to 
work  with  naval  ship  maintenance  metrics  groups  to  provide  additional  options  regarding 
how  large  data  sets  could  be  optimized.  In  particular,  presentation  methods  were 
requested  succinctly  showing  a  ship’s  maintenance  status,  including  all  operational  costs 
and  schedule  deviations  from  planned  maintenance.  Project  sponsors  also  sought 
suggestions  for  improving  how  key  information  could  be  summarized  and  ultimately 
used  in  making  critical  maintenance  allocation  decisions.  The  current  process  for 
presenting  data  on  the  more  than  150  parameters  measuring  ship  performance 
maintenance  costs  and  processes,  containing  billions  of  data  points,  is  still  done  by  static, 
cumbersome  Excel  spreadsheets. 

The  project  was  conducted  in  three  distinct  phases  as  seen  in  Eigure  25.  Eirst,  data 
was  collected  on  19  U.S.  Navy  guided  missile  destroyers  (DDG)  with  maintenance 
periods  spanning  a  few  years,  2010  to  mid- 2013.  Data  were  collected  on  21  maintenance 
availabilities  for  those  DDGs  and  included  definitized  estimates  prepared  by  SMEs  in  the 
planning  process,  along  with  the  actual  cost  and  availability  data  on  three  maintenance 
categories.  A  hypothesis  was  tested  and  two  simulations  were  run  using  the  Knowledge 
Value  Added  (KVA)  methodology  in  Phase  2.  In  Simulation  1,  we  tested  what  the 
potential  impact  of  incorporating  three  dimensional  (3D)  printing  (3DP)  on  ship 
maintenance  programs  while  Simulation  2  evaluated  the  combination  of  3DP  plus  two 
more  technologies  [(3D  laser  scanning  technology  (3D  EST)  and  Collaborative  Product 
Eifecycle  Management  (CPEM)].  In  Phase  3,  a  visualization  tool  offered  by  an 
independent  software  vendor  was  selected  to  show  how  large  volumes  of  data  could  be 
shown  in  a  condensed  and  intuitive  manner. 
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Phase  1 ;  Data  Collection 

•  Definitized  estimates  for  19  guided  missile  destroyers  (DDG) 

•  21  maintenance  availabilities  from  2010  to  mid-2013 

■  Actual  costs  from  Surface  Team  One  Metrics  System  (STIMS) 

■  Cost  categories  of  Original  Work,  Growth,  New  Work,  New  Growlh 


Phase  2 :  Simulations 

simulation  1  3D  Printing  technology  (3DP) 

Simulation  2  3D  Printing  technology  (3DP)  + 

3D  Laser  Scanning  Technology  (3D  LST)  + 
Collaborative  Product  Lifecycle  Management  (CPLM) 


Phase  3 :  Analysis  &  Results 

•  Definitized  cost  estimates  for  maintenance  wor  ($313.7  m) 

•  Actual  costs  for  maintenance  work  ($435.5  m) 

•  Cost  estimates  after  simulations  incorporating  technologies  ($271.1  m) 
■  Potential  cost  savings  of  37.7%  (S164.4  m) 

Figure  25.  Project  Phases 


The  visualization  software  provides  a  compressed  overview  yielding  a  higher 
level  of  visual  clarity,  enabling  faster,  more  intuitive  interpretation  of  ship  maintenance 
data  by  presenting  the  data  relationships  in  diagrams,  graphs  and  charts.  Relationships 
among  variables  are  more  readily  discoverable  and  those  relationships  could  be  used  in 
forecasting  to  develop  more  accurate  maintenance  data;  estimates  that  are  based  on 
historical  data.  Decision  makers  are  able  to  see  analytical  results  quickly  with 
visualization  software,  finding  relevance  among  millions  of  variables,  communicating 
concepts  and  hypotheses  to  others,  and  even  forecasting  possible  scenarios. 

This  section  of  the  report  is  divided  into  several  areas.  Maintenance  categories 
and  the  data  collection  process  are  initially  reviewed.  Final  simulation  results  are 
highlighted  to  provide  a  framework  for  understanding  the  power  of  visualization 
software,  followed  by  a  general  discussion  of  the  original  definitized  cost  estimate 
(Figures  27-29).  Actual  costs  are  then  compared  with  the  definitized  cost  estimates  and 
discrepancies  between  the  two  are  discussed  (Figures  30-34).  An  analysis  of  the  potential 

effect  on  ship  maintenance  costs  by  incorporating  specific  technologies  in  Simulation  1 
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and  Simulation  2  is  discussed  in  greater  detail  (Figures  35-38).  Alternative  presentation 
methods,  which  drill  down  into  specific  detail,  are  then  explored  (Figures  39-41).  Then,  a 
description  and  analysis  of  a  common  ship  maintenance  metric,  lost  operating  days 
(LOD),  is  given  along  with  a  recommendation  of  an  alternative  and  more  highly 
correlated  metric,  availability  density  (Figures  42-44).  This  section  concludes  with 
further  examples  of  the  drill  down  ability  into  specific  details  that  are  offered  by 
visualization  tools  (Figures  45  and  46). 

B.  MAINTENANCE  COST  CATEGORIES 

There  are  several  cost  categories  for  ship  maintenance:  original  work  (OW), 
growth  (G),  new  work  (NW),  and  new  growth  (NG).  OW  is  the  estimated  ship 
maintenance  cost  (shipyard  or  contractor,  labor  and  material  costs)  at  the  completion  of 
planning  and  is  also  known  as  the  definitized  cost  estimate.  The  definitized  cost  estimate 
is  a  figure  provided  by  a  SME  in  the  planning  process. 

G  is  an  expansion  of  OW  and  can  result  from  many  factors,  including 
undiscovered  discrepancies  or  an  increase  in  scope.  For  example,  the  OW  plan  for  a 
hypothetical  ship  called  for  preservation  work  on  the  ship’s  hull.  While  conducting  the 
preservation  work,  the  maintenance  technician  discovered  hull  damage  that  required 
minor  repair.  The  minor  repair  work  would  be  classified  as  G. 

NW  is  maintenance  which  is  added  to  a  ship’s  availability  after  planning  has  been 
completed  (i.e.,  not  part  of  the  original  work  maintenance  package).  NW  can  result  from 
discrepancies  that  have  not  yet  been  discovered  and  are  unrelated  to  previously  planned 
maintenance  or  from  work  that  was  not  yet  added  to  the  availability  work  package  until 
after  planning  was  complete.  For  example,  while  conducting  preservation  work  on  the 
hypothetical  ship,  the  maintenance  technician  discovered  damage  to  a  communication 
antenna.  The  resulting  repair  work  would  be  classified  as  NW. 

NG  is  the  growth  resulting  from  an  expansion  in  NW,  similar  to  the  relationship 
between  G  and  OW.  For  example,  the  antenna  maintenance  technician  conducting 
antenna  repair  work  discovered  that  the  antenna  was  beyond  repair  and  needed  to  be 
replaced.  Replacement  of  the  antenna  would  be  considered  NG. 


43 


C.  DATA  COLLECTION 

Data  for  this  analysis  were  derived  from  the  Surface  Team  One  Metrics  System 
(STIMS)  website  (https://mfom-shipmain.nmci.navy.mil).  In  particular,  ship 
availabilities  were  selected  for  examination  based  on  several  factors  designed  to  establish 
a  proof  of  concept  for  the  use  of  big  data  to  shape  executive  level  decisions.  The 
availabilities  considered  were  restricted  to  only  U.S.  Navy  DDGs  whose  maintenance 
period  started  by  2010  and  whose  final  reports  were  closed  and  completed  by  the  time 
this  study  began  in  2013.  Those  ships,  whose  close-out  reports  were  incomplete  or 
missing  data,  were  not  included  in  the  analysis.  This  resulted  in  a  sample  of  19  DDGs. 
Currently,  there  are  62  DDGs  operating  in  the  U.S.  Navy  (U.S.  Navy  Fact  File,  2014) 
which  translates  into  a  31%  sample  size. 

The  figures  that  follow  are  screen  shots  of  solar  graph  results  that  were  captured 
while  using  the  visualization  software  program  to  process  the  ship  maintenance  data 
obtained  from  the  STIMS  website.  A  subset  of  the  data  was  put  into  an  spreadsheet  for 
this  study  to  keep  the  dataset  manageable,  prove  the  hypothesis,  and  provide  input  to  the 
visualization  software  model.  This  consists  of  21  maintenance  availabilities  for  the  19 
DDGs. 

D.  FINAL  SIMULATION  RESULTS  INCORPORATING  DIFFERENT 

COMBINATIONS  OF  TECHNOLOGIES  INTO  U.S.  NAVY  SHIP 

MAINTENANCE  PROGRAMS 

Two  simulations  were  run  to  show  the  potential  cost  savings  of  incorporating 
specific  technologies.  In  Simulation  1,  only  3DP  technology  was  evaluated  while  in 
Simulation  2,  three  technologies  combined  were  evaluated.  Tables  5  and  6  reflect  the 
differences  between  definitized  costs,  actual  costs,  and  the  costs  projected  for 
Simulations  1  and  2.  The  definitized  cost  estimate  was  $313.7  million,  compared  to  the 
actual  cost  of  $435.5  million.  If  3DP,  3D  LST  and  CPLM  technologies  combined  were 
incorporated  into  the  ship  maintenance  processes,  the  costs  are  estimated  to  have  been 
reduced  to  $271.1  million,  or  savings  of  $164.4  million. 
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Table  5.  Cost  Comparison  by  Ship  (after  J.  Kornitsky,  personal  communication, 

November,  2013) 


Cost  Comparison 

)y  Ship  (dollars  represented  in  millions) 

Ship 

Definitized 

Cost 

Estimates 

Actual  Cost$ 

%  vs. 

Definitized 

Simulation  1 
(3DP) 

%  VS.  Actual 

Simulation  2 

Radical 
(3DP,  3D 
LST,  CPLM) 

%  VS.  Actual 

Barry 

S48  0 

$701 

46.0% 

$65  81 

-6.1% 

$43.9 

-37.3% 

Arleigh  Burke 

$46  9 

$58  0 

23-6% 

$56  4 

-2.6% 

$35  7 

-38.4% 

Ramage 

$46  3 

$57  2 

23.5% 

$5SJI 

-2.3% 

$35.7 

-37.5% 

Donald  Cook 

$21  4 

$36^ 

69.7% 

$36  2 

-0.2% 

$229 

-36.7% 

Stout 

$454 

$63  2 

39.0?^o 

$641 

1.4% 

$38  6 

-38.8% 

All  Other 

$105  6 

$150  5 

42.5% 

$147  5 

-1.9% 

$94.1 

-37.4% 

TOTAL 

$313  7 

S4SS.S 

38.7% 

S426Ji 

-1.9% 

$271  1 

-37.7% 

Table  6.  Cost  Comparison  by  Work  (after  J.  Kornitsky,  personal  communication, 

November,  2013) 


Cost  Comparison  by  Work  (doilars  represented  in  millions) 

Work 

Definitized 

Cost 

Estimates 

Actual  Costs 

%vs. 

Definitized 

Simulation  1 

(3DP) 

%  v$.  Actual 

Simulation  2 

Radical 

(3DP,  3D 
LST,  CPLM) 

%  v$.  Actual 

Original 

$313  7 

$313  7 

0.0% 

$307  3 

-2.0% 

$195  4 

-37.7% 

Growth 

$00 

$47  1 

100.0% 

$45  7 

-3.0% 

$28  1 

-40.2% 

New  Work 

$no 

$66.8 

100.0% 

$65  5 

-1.9% 

$430 

-35.6% 

New  Growth 

$00 

$7  7 

100.0% 

$7  4 

-3.9% 

$45 

-41.6% 

TOTAL 

$313  7 

$435  5 

38.7% 

$426  2 

-2.1% 

$271 1 

-37.7% 

E.  VISUALIZATION  SOFTWARE  ANALYSIS  OF  U.S.  NAVY  SHIP 
MAINTENANCE 

I.  Visualization  Model 

The  visualization  model  (Figure  26)  is  an  overview  of  how  the  DDG  spreadsheet 
data  was  mapped  into  the  software.  It  shows  four  cost  categories  on  top,  all  19  ships  by 
name  in  the  middle,  and  their  combined  availabilities  at  the  bottom.  The  lines  between 
the  boxes  depict  connection  relationships. 

The  24  boxes  referred  to  in  the  model  have  a  number  above  each  that  represents 

the  aggregate  cost.  For  example,  the  box  on  the  middle  left  side  of  Figure  26,  labeled 

Stout,  indicated  $28.1  million  of  aggregate  cost  attributed  to  the  availability.  In  addition, 

the  horizontal  bar  between  the  cost  number  and  the  box  represented  the  relative  portion  of 
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cost  attributed  to  that  availability  when  compared  with  all  availabilities.  The  box  in  the 
top  left  corner,  labeled  Growth,  indicated  a  relative  cost  which  resulted  in  the  length  of 
the  bar  shown. 

At  the  bottom  of  each  box,  the  number  of  connections  to  all  other  variables  was 
depicted  in  two  ways.  The  number  displayed  in  the  bottom  right  of  each  box  and  the 
number  of  ovals  displayed  in  the  bottom  left  of  each  box.  At  the  box  at  the  bottom  of 
Figure  26,  labeled  Avail,  indicated  21  connections  to  all  other  variables  with  both  the 
number,  21,  and  the  number  of  ovals  displayed,  21. 

At  the  top  of  Figure  26,  four  boxes  are  depicted  and  represent  one  category  of 
cost,  type  of  work.  The  labels  on  each  box  indicate  a  particular  type  of  work,  G,  NG, 
NW,  and  OW.  Each  particular  type  of  work  accounted  for  the  amount  of  cost  indicated. 

In  the  middle  of  Figure  26,  the  19  boxes  labeled  with  ship  names  indicate  the 
maintenance  cost  each  ship  incurred.  For  17  of  the  ships,  the  ship  maintenance  cost  was 
attributed  to  a  single  availability.  For  the  Arleigh  Burke  and  the  Donald  Cook,  the  ship 
maintenance  cost  was  attributed  to  two  availabilities.  For  example,  the  box  labeled 
Arleigh  Burke  in  Figure  26  indicated  $35.7  million  in  ship  maintenance  cost,  but  for  two 
unique  availability  periods.  This  can  be  verified  by  referencing  the  number  in  the  lower- 
right  portion  of  the  ship  name  boxes.  For  most  of  the  ships,  this  number  was  4  and  the 
number  of  ovals  was  four.  This  represented  the  number  of  connections  to  the  kinds  of 
cost.  In  any  single  availability,  there  were  four  types  of  work  (cost)  identified  (OW,  G, 
NW,  and  NG).  In  the  cases  of  the  Arleigh  Burke  and  Donald  Cook  ships,  there  were  two 
availabilities  recorded,  and,  therefore,  eight  connections  to  the  four  types  of  work  (cost) 
as  was  indicated  by  the  number,  8,  and  the  eight  ovals  indicated  in  either  box  in  Figure 
26. 

The  single  box  depicted  at  the  bottom  of  Figure  26  represented  the  aggregate 
forecasted  cost  of  all  availabilities,  $271.1  million. 
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Overview 


Figure  26.  Visualization  Model  (from  J.  Komitsky,  personal  communieation,  November,  2013) 
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2.  Definitized  Estimate,  All  Ships 

Definitized  estimates  are  the  total  projected  costs  of  an  availability  upon 
completion  of  the  planning  phase  of  ship  maintenance,  provided  by  subject  matter  experts 
in  the  planning  process.  According  to  the  Joint  Fleet  Maintenance  Manual  (DON,  2012b), 
the  planning  phase  for  an  availability  for  a  DDG  begins  720  days  before  the  first  day  of 
maintenance  (A-720).  By  this  day,  A-720,  an  availability  must  be  added  to  the  U.S.  Navy 
surface  ship  availability  schedule.  The  next  milestone,  a  letter  of  authorization,  occurs  on 
or  before  A-360  and  obligates  the  stakeholders  to  specific  cost  of  prorate  schedules. 
Through  the  next  three  milestones,  A-240  (50%),  A-120  (80%),  and  A-75  (100%), 
progressively  more  of  the  budgeted  funds  must  be  allocated,  or  locked,  to  specific  work 
items.  By  A-60  the  overall  plan  for  maintenance  must  be  finalized  to  allow  the  detailed 
work  schedule  to  be  formulated  and  cost  estimates  completed.  The  final  cost  estimate,  or 
definitized  work  package,  must  be  finished  by  A-35  and  represents  all  costs  attributed  to 
OW.  After  definitization,  all  additional  work  items  are  considered  to  be  G,  NW,  or  NG 
(DON,  2012b). 

Figure  27,  the  solar  graph  representation  of  the  Definitized  Estimate,  All  Ships, 
shows  how  each  ship  contributed  to  the  total  expected  cost  of  all  the  availabilities 
analyzed.  The  total  of  $313.7  million  is  greater  than  the  total  presented  in  the  previous 
image,  $271.1  million.  As  explained  earlier,  this  is  because  the  visualization  model  shows 
the  total  costs  after  the  combined  incorporation  of  three  different  technologies  into  the 
ship  maintenance  process. 

All  the  figures  shown  in  this  section  present  a  parent-child  type  of  relationship 
hierarchy,  similar  to  object-oriented  programming.  In  Figure  27  there  exists  only  a  simple 
relationship  with  each  instance  having  assumed  a  single  role.  The  total  definitized 
estimate  of  $313.7  million  in  the  center  is  the  parent,  while  all  the  ships,  and  their  total 
maintenance  costs,  are  the  children.  Hence,  the  parents’  numbers  are  the  sum  of  all  the 
children  connected  directly  one  level  down  and  occurs  at  every  level.  Multi-role 
instances,  where  the  single  solar  graph  instances  can  be  both  parent  and  child,  will  be 
presented  in  later  figures,  beginning  with  Figure  29. 
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Each  ship  contributed  to  the  total  definitized  estimate  of  $313.7  million.  The 
amount  each  contributed  is  presented  in  three  different  ways.  First,  the  size  of  each 
bubble  signifies  its  cost  relative  to  the  total  cost  bubble  in  the  middle  of  the  model.  The 
larger  the  relative  cost  of  the  ship  identified,  the  larger  the  bubble.  Second,  the  relative 
impact  of  each  ship  on  cost  is  also  identified  by  a  percentage  written  on  the  line 
connecting  each  ship  with  the  total.  Finally,  the  actual  dollar  amount  of  each  ship’s 
impact  upon  the  definitized  estimate  is  shown  either  inside  the  instance  for  larger 
contributors  or  near  the  instance  for  smaller  ones. 

The  Winston  Churchill,  for  example,  which  is  located  at  the  8  o’clock  position  on 
Figure  27,  was  not  the  largest  contributor  to  the  total  definitized  estimate.  However,  a 
brief  visual  analysis  of  the  entire  figure  shows  it  was  not  the  least  significant  either 
because  many  of  the  ship  solar  graph  instances  are  smaller.  The  relative  sizes  and 
organization  of  all  the  instances  enable  an  intuitive  understanding  to  be  quickly 
developed.  The  Winston  Churchill  instance  is  larger  than  the  four  instances  directly 
below  it,  but  it  is  also  smaller  than  the  four  instances  directly  above  it.  The  relative 
location  of  the  Winston  Churchill  instance  enables  a  decision  maker  to  quickly  identify 
that  the  ship’s  relative  contribution  to  the  overall  definitized  estimate  lies  somewhere  in 
the  middle  of  the  pack. 

If  further  understanding  of  the  relative  contribution  is  needed,  the  decision  maker 
would  then  refer  to  the  percentage  indicated  along  the  line  connecting  the  Winston 
Churchill  to  the  total  estimate.  The  Winston  Churchill  accounted  for  3.7%  of  the  total 
definitized  estimate.  However,  if  the  actual  dollar  amount  contributed  to  the  total  is 
desired,  then  the  decision  maker  could  refer  to  the  number  located  within  the  instance.  In 
the  case  considered,  the  Winston  Churchill  accounted  for  $1 1.7  million  in  absolute  terms. 
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Definitized  Estimate,  All  Ships 


Ship 


Figure  27.  Definitized  Estimate,  All  Ships  Solar  Graph 

(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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3.  Definitized  Estimates  of  the  Top  5  Ships 

Figure  28,  Definitized  Estimate,  Top  5  Ships,  is  nearly  identical  to  Figure  27 
except  that  it  has  been  modified  to  identify  the  largest  cost  contributors.  To  focus  on  the 
highest  cost  contributors  and  reduce  visual  clutter,  the  number  of  ships  displayed  has 
been  reduced  to  five  and  the  remaining  14  are  aggregated  into  the  “All  Other”  bubble. 

Consider  the  decision  maker  analyzing  the  presentation.  If  the  executive  is  only 
interested  in  the  largest  cost  contributors,  then  the  addition  of  the  other  14  ships  only 
makes  interpretation  of  the  information  more  difficult.  However,  the  aggregation  of  the 
remaining  ships  into  a  single  instance  also  provides  another  view  of  the  data.  In  this 
example,  the  total  definitized  estimate  of  the  other  ships  is  $105.6  million  and  represents 
33.6%  of  the  entire  sum.  This  view  may  be  significant  to  a  decision  maker  who  originally 
thought  that  the  largest  cost  contributors  represented  a  much  larger  portion  of  the  total.  In 
this  figure,  a  decision  maker  would  easily  be  able  to  determine  that  the  impact  of  the 
remaining  14  ships  is  much  greater  than  the  impact  of  any  single  large  cost  contributor. 

Alternatively,  if  the  decision  maker  was  more  interested  in  determining  the 
sources  of  the  expenses,  then  an  additional  level  of  detail  would  be  necessary.  While 
Figure  28  provided  cost  information,  the  costs  were  aggregated  at  the  ship  level.  An 
executive  interested  in  determining  the  primary  drivers  of  cost  would  need  more  detailed 
information  that  can  be  found  in  Figure  29. 

Additionally,  the  data  can  be  depicted  according  to  the  desires  of  the  viewer.  In 
this  instantiation  of  the  data,  a  list  of  options  was  created  to  allow  for  grouping 
information  according  to  cost  source.  The  grouping  options  can  be  selected  in  the 
software  and  are  as  follows: 

•  Expense  detail  (includes  labor  and  material). 

•  Type  of  expense  (includes  labor,  sub  labor,  material,  sub  material). 

•  Ship  (includes  each  individual  ship  name). 

•  Work  (includes  OW,  G,  NW,  NG). 

•  Availability  (includes  each  avail  identification  number). 
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Definitized  Estimate,  Top  5  Ships 


Ship 


Figure  28.  Definitized  Estimate,  Top  5  Ships  Solar  Graph 

(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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4.  Definitized  Estimates  of  Top  5  Ships  by  Expense  Details 

Figure  29,  Definitized  Estimate,  Top  5  Ships,  Expense  Detail,  adds  one  level  of 
detail  to  the  figure  previously  discussed  and  is  valuable  for  identifying  cost  sources.  The 
additional  details  are  the  two  cost  categories  of  labor  and  material,  which  can  be  seen, 
radiating  farther  from  the  graph’s  center  and  labeled  with  the  availability’s  identification 
number  from  which  it  originated.  These  additional  details  to  the  definitized  estimate  of 
the  top  5  ships  increased  the  complexity  of  the  parent-child  hierarchy  and  produced 
different  numbers  of  children  among  the  ship  level  instances.  Eor  the  executive  using  this 
solar  graph  to  make  important  ship  maintenance  decisions,  it  is  important  to  understand 
the  changes. 

Eirst,  the  parent-child  relationship  hierarchy  has  increased  in  complexity.  With  the 
addition  of  another  level  of  detail,  or  another  layer  of  children,  the  ship  name  solar 
instances  have  become  both  parent  and  child.  The  ship  names  are  still  children  to  the 
parent,  total  definitized  estimate,  but  are  now  also  parents  to  the  expense  details.  Eor 
example,  located  at  the  one  o’clock  position  in  Eigure  29,  the  Barry  solar  graph  instance 
has  spawned  two  children,  Eabor  and  Material.  The  Barry,  originally  only  a  child  to  the 
total  definitized  estimate,  is  now  also  a  parent  to  its  two  children.  However,  this  concept 
has  produced  ship  name  parents  with  varying  numbers  of  children  and  their  causes  may 
not  be  initially  intuitive. 

Earlier,  both  the  Arleigh  Burke  and  the  Donald  Cook  ships  were  identified  as 
being  irregular  because  they  represented  multiple  availabilities.  The  addition  of  expense 
detail  has  further  demonstrated  the  presence  of  two  separate  maintenance  periods  within 
each.  Just  above  the  three  o’clock  position  in  the  solar  graph,  the  Arleigh  Burke  shows 
four  children.  Two  are  labeled  as  Eabor  and  two  are  labeled  as  Material.  However,  each 
one  labeled  Eabor  is  identified  by  a  unique  availability  identification  number,  and  each 
one  labeled  Material,  the  same  unique  numbers.  The  Arleigh  Burke  and  Donald  Cook 
multiple  availability  instances  produced  four  children  as  opposed  to  the  two  children 
generated  by  the  single  availability  instances  of  the  Barry,  Ramage,  and  Stout  ships. 
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To  an  executive,  the  additional  level  of  detail  in  the  solar  graph  begins  to  remove 
ambiguity  and  provide  clear  relationships  among  the  sources  of  cost.  But,  if  the  manner 
and  method  in  which  the  detail  is  presented  is  confusing,  then  the  additional  information 
will  only  further  confound  the  decision  maker.  Understanding  why  ships  produced 
varying  numbers  of  Labor  and  Material  children  is  important  for  the  executive  to  make 
appropriate  decisions  regarding  ship  maintenance  based  on  the  solar  graph.  However,  the 
six  children  subordinate  to  the  All  Other  instance  at  the  ten  o’clock  position  in  Figure  29 
also  requires  explanation. 

The  reason  the  All  Other  instance  produced  six  children  is  two-fold.  First,  the  All 
Other  instance  includes  14  ships  and,  therefore,  14  availabilities  (since  the  Arleigh  Burke 
and  Donald  Cook  have  already  been  accounted  for).  The  definitized  cost  estimate  for 
each  availability  has  been  categorized  into  Labor  and  Material.  So,  14  availabilities 
should  have  generated  28  expense  detail  children.  There  are  more  than  just  two  or  four 
children  available  to  display.  This  leads  to  the  second  part  of  the  two-fold  explanation.  In 
Figure  29,  the  number  of  children  to  be  displayed  was  arbitrarily  chosen.  The  top  5 
largest  contributors  retained  their  individual  solar  instances,  and  the  remaining  were 
aggregated  into  the  All  Other  instance.  The  choice  to  display  the  top  5  ships  in  the  solar 
representation  with  less  detail  has  also  affected  this  graph.  The  biggest  5  individual 
contributors,  all  which  happen  to  be  Labor  instances,  are  displayed  while  the  remaining 
are  aggregated  into  the  All  Other  instance.  Again,  the  implication  for  the  executive  using 
this  solar  graph  to  form  ship  maintenance  policy  decisions  is  that  if  the  manner  and 
method  of  generation  aren’t  known,  then  the  insight  derived  from  the  graph  will  be 
erroneous.  For  example,  if  the  decision  maker  assumed  that  the  All  Other  category 
displayed  all  its  children,  then  they  would  misunderstand  the  graph  and  believe  that  only 
labor  costs  were  incurred  for  those  14  ships. 

From  Figure  28,  previously  seen,  the  decision  maker  was  interested  in  finding 
more  about  the  cost  sources.  Now  in  Figure  29,  with  an  added  level  of  detail,  the  decision 
maker  could  make  some  more  observations  and  gain  a  deeper  understanding  of  cost 
drivers.  For  instance,  the  top  5  ships  all  demonstrated  that  for  a  given  availability,  labor 
impacted  cost  more  than  material.  Specifically,  consider  the  Barry,  Ramage,  and  Stout. 
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The  labor  costs  accounted  for  percentages  ranging  from  70.8%  to  73.4%.  In  this  small 
sample,  the  decision  maker  could  develop  cost  baselines  indicating  that  for  a  given 
availability,  labor  accounted  for  about  70%  of  the  cost  and  material  accounted  for  about 
30%.  Given  that  the  small  sample  size  is  an  accurate  representation  of  DDG  ship 
maintenance,  then  the  definitized  cost  estimates  of  future  availabilities  could  be 
compared  to  the  baseline  and  predictions  generated  about  how  the  cost  profile  might 
change  before  work  is  completed. 
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Definitized  Estimate,  Top  5  Ships,  Expense  Detail 
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Figure  29.  Definitized  Estimate,  Top  5  Ships,  Expense  Detail  Solar  Graph 
(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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5.  Actual  Costs  of  the  Top  5  Ships  hy  Type  Expense 

While  the  definitized  cost  estimate  solar  graphs  do  produce  valuable  information, 
they  represent  only  well-educated  guesses  of  the  actual  cost.  The  value  of  the  next 
visualization  is  that  actual  costs  can  be  traced  to  their  origins,  whether  they  resulted  from 
shipyard  or  contractor  work.  The  next  solar  graph  provides  actual  cost  (Figure  30)  and  is 
organized  by  the  top  5  ships  with  an  additional  level  of  detail.  Figure  30  also  provides  the 
additional  information  of  Type  of  Expense. 

Most  noticeably,  the  total  actual  cost,  represented  by  the  largest  solar  instance  in 
the  center  of  the  graph,  has  increased  to  $435.5  million.  However,  referring  back  to  the 
previous  figure  (Figure  29),  definitized  cost  was  estimated  to  be  $313.7  million  so  the 
costs  actually  increased  by  38.8%.  A  visualization  tool  enables  the  decision  maker  to  drill 
down  further  to  identify  the  largest  cost  drivers. 

The  types  of  expenses  figure  provides  further  drill  down  into  the  cost  sources. 
Whereas  expense  detail  was  broken  down  into  only  labor  and  material  categories,  type 
expense  splits  those  into  (shipyard)  labor,  sub  (contractor)  labor,  (shipyard)  material,  and 
sub  (contractor)  material.  From  here  forward,  the  additional  description  in  parentheses 
will  be  excluded  but  the  terms  will  retain  their  definitions.  Labor  and  material,  in  the 
context  of  type  expense,  refer  to  the  labor  and  material  costs  associated  with  the  shipyard 
hosting  the  availability.  Sub  labor  and  sub  material  refer  to  the  same  costs,  but  those 
associated  with  the  expense  incurred  by  subcontractors. 

In  the  Arleigh  Burke,  at  the  four  o’clock  position  on  the  figure,  the  definitized 
estimate  for  this  ship  was  $46.9  million  and  the  actual  cost  was  $58  million.  That 
represents  an  increase  of  23.7%.  However,  a  decision  maker,  knowing  that  labor  is  a 
larger  contributor  to  cost  than  material,  wants  to  know  what  type  labor  expense  is  more 
responsible,  the  shipyard  or  the  subcontractors.  In  the  case  of  the  Arleigh  Burke,  sub 
labor  accounted  for  50.2%,  whereas  labor  represented  only  19.2%  of  total  availability 
cost.  Representing  a  majority  of  cost  for  the  Arleigh  Burke,  perhaps  sub  labor  should  be 
examined  for  cost  reduction  opportunities. 
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The  bubble  charts  of  either  definitized  estimates  or  actual  costs  provide  decision 
makers  with  valuable  insight.  However,  the  size  difference  between  estimates  and  actual 
costs  would  provide  an  understanding  of  the  sources  of  cost  growth.  For  instance,  an 
executive  is  interested  in  determining  the  primary  driver  of  increased  costs.  While  the 
previous  solar  graphs  possess  the  necessary  information,  further  calculations  are  needed 
to  figure  changes  in  cost.  If  the  relative  and  actual  changes  in  cost  were  displayed  on  the 
same  graph,  then  the  decision  maker  would  be  able  to  easily  identify  the  primary  drivers 
of  cost  growth  and  cost  savings.  The  next  four  figures  (Figures  31-34)  demonstrate  the 
concept  of  representing  both  the  definitized  estimates  and  actual  costs  simultaneously. 


58 


Actual  Cost,  Top  5  Ships,  Type  Expense 
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Figure  30.  Actual  Cost,  Top  5  Ships,  Type  Expense  Solar  Graph 

(from  J.  Kornitsky,  Personal  Communication,  November,  2013) 
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6.  Definitized  Estimate  versus  Actual  of  the  Top  5  Ships  by  Type 
Expense 

Figure  31,  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Type  Expense 
displays  a  zoomed-in  look  at  the  comparison  format  to  provide  an  introduction  to  the  new 
characteristics,  and  to  review  some  old  ones.  Starting  at  the  nine  o’clock  position  on  the 
bubble  chart  (Eigure  31),  the  first  characteristic  examined  is  the  shell.  The  shell  thickness 
and  color  represent  the  difference  in  amount  of  change  and  whether  the  change  was  cost 
growth  (red)  or  cost  savings  (green). 

Proceeding  clockwise,  the  terms  are  familiar,  but  their  presentation  is  new. 
Definitized  cost  estimate  and  actual  cost  refer  to  the  estimated  cost  at  the  end  of  planning 
and  the  cost  incurred  upon  completion  of  the  availability,  respectively.  In  this  figure,  the 
definitized  estimate  is  represented  by  the  inner  layer  of  the  shell  and  the  actual  cost  by  the 
outer  layer.  Eor  example,  the  largest  bubble  represents  total  cost.  The  inner  layer  shows 
how  large  the  instance  would  be  if  only  the  total  definitized  estimate,  $313.7  million,  was 
displayed.  The  outer  layer  shows  how  large  the  instance  would  be  if  only  the  total  actual 
cost,  $435.5  million,  was  displayed.  The  difference  between  the  layers,  or  the  thickness 
of  the  shell,  represents  the  change  in  cost  and  is  numerically  indicated  by  the  percentage 
shown,  38.7%.  The  definitized  estimate  was  less  than  the  actual  cost,  which  means  that 
there  was  cost  growth  and  is  represented  by  the  color  red. 

Although  the  next  two  aspects  of  the  bubble  chart  are  familiar,  it  requires  further 
clarification.  Eirst,  the  number  represented  in  millions  of  dollars  is  the  final  state  of  the 
instance.  Eor  this  comparison  between  definitized  estimate  and  actual  cost  of  the  Barry, 
located  at  the  one  o’clock  position  in  Figure  31,  the  final  state  is  the  actual  cost  which 
was  $70.1  million.  Second,  the  percentage  immediately  below  the  actual  cost  value 
indicates  the  change  from  the  initial  state  (definitized  estimate)  to  the  final  state  (actual 
cost).  In  the  case  of  the  Barry,  the  cost  grew  by  46%  from  the  definitized  estimate  to  the 
actual  cost. 

The  final  characteristic  identified  on  the  close-up  is  another  percentage.  Whereas 
the  percentage  within  the  instance  represented  cost  growth,  the  percentage  on  the  line 
between  parent  and  child  represented  the  proportion  of  the  parent’s  cost  that  the  child 
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contributed.  In  the  figure,  the  dialog  box  arrow  points  at  the  percentage  which  the  child 
instance  accounted  for  with  regard  to  its  parent,  the  Barry,  or  17.4%  of  the  total  actual 
cost  incurred  by  the  Barry. 
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Definitized  Estimate  vs.  Actual,  Top  5  Ships,  Type  Expense,  Close-up 
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Figure  31.  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Type  Expense,  Solar  Graph  Close-up 

(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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7. 


Definitized  Estimate  versus  Actual  of  the  Top  5  Ships  by  Type 
Expense 


The  most  distinguishing  feature  of  Figure  32,  Definitized  Estimate  versus  Actual, 
Top  5  Ships,  Type  Expense,  which  has  been  organized  to  display  the  top  5  ships  to  show 
an  additional  level  of  detail  according  to  type  expense,  is  the  near  absence  of  green.  The 
two  instances  of  cost  savings,  both  titled  All  Other  and  located  at  about  the  nine  and  five 
o’clock  positions  on  the  most  outer  ring,  are  relatively  insignificant  representing  only 
0.1%  of  the  total  cost,  $435.5  million.  Examples  of  cost  growth  are  abundant,  but  an 
examination  of  the  largest  contributor  to  total  cost  may  produce  valuable  insight  for  the 
executive  level  decision  maker. 

The  All  Other  instance  at  the  ten  o’clock  position,  represents  14  ships.  Those  14 
ships  accounted  for  $150.5  million,  or  34.5%,  of  the  total  actual  cost.  The  red  shell  and 
the  percentage  inside  the  All  Other  instance  together  indicate  42.5%  aggregate  cost 
growth  for  the  14  ships.  These  numbers  reveal  that  the  All  Other  category  would  be  an 
area  for  a  decision  maker  to  examine  more  closely  in  an  attempt  to  identify  the  drivers  of 
cost  growth.  A  cursory  glance  at  the  children  of  the  All  Other  instance  shows  that 
subcontractors,  both  sub  labor  and  sub  material,  experienced  more  than  50%  cost  growth. 
Therefore,  subcontractors  are  a  primary  driver  of  cost  growth  for  at  least  the  14  ships 
represented  by  the  All  Other  instance. 

The  visualization  software  provides  the  ability  to  delve  into  the  data  to  discover 
more  detail.  Eor  example,  if  personnel  are  preparing  a  presentation  based  on  Definitized 
Estimate  versus  Actual,  Top  5  Ships,  Type  Expense  data  (Eigure  32),  and  the  decision 
maker  asks  the  question,  “What  was  the  definitized  estimate  for  the  Barry?”  then  the 
answer  can  readily  be  found.  Rather  than  regress  to  previous  solar  graphs,  the  presenter 
can  simply  select  the  Barry  instance  and  pull  up  a  bar  chart  which,  among  other 
information,  displays  the  definitized  estimate.  Perhaps  the  decision  maker  requests  even 
finer  details.  The  software  possesses  the  ability  to  drill  down  five  levels  of  detail,  and  can 
reproduce  the  data  located  on  the  original  spreadsheet.  So,  more  detail  is  available  than 
just  what  is  displayed  on  the  static  solar  graphs  presented  here.  Refer  to  the  two  figures 
near  the  end  of  this  section  titled,  Barry  Drill  Down,  for  examples  (Eigures  44  and  45). 
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Definitized  Estimate  vs.  Actual,  Top  5  Ships,  Type  Expense 
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Figure  32.  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Type  Expense  Solar  Graph 
(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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8.  Definitized  Estimate  versus  Actual  of  the  Top  5  Ships  hy  Work 

Figure  33,  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Work,  is  the  last  of 
three  figures  showing  simultaneous  display  of  both  definitized  estimates  and  actual  costs, 
and  provides  an  additional  detail  of  work  instead  of  type  of  expense.  As  previously 
discussed,  maintenance  work  is  broken  down  into  four  types:  OW,  G,  NW  and  NG. 
Changing  the  detail  to  allocate  cost  by  work  creates  a  couple  peculiarities,  both  related  to 
the  definitions  of  the  work,  and  important  for  the  executive  level  decision  maker  to 
understand. 

There  are  two  anomalies  when  the  data  are  changed  to  show  work  details.  The 
first  peculiarity  is  that  there  are  now  a  significant  number  of  instances  that  possibly 
indicate  cost  savings.  Unfortunately,  all  the  percentages  within  the  green  shelled 
instances  are  left  blank  revealing  that  no  change  (0%)  has  taken  place.  That  is  because  the 
instances  are  representing  OW,  which  does  not  change  after  the  completion  of  planning 
making  the  percentage  within  the  instance  irrelevant.  For  example,  refer  to  the  Arleigh 
Burke  solar  graph  instance  at  the  four  o’clock  position  in  Figure  33.  The  green  shelled 
child  instance  attached  to  the  Arleigh  Burke  is  labeled  Original  for  OW.  The  percentage 
displayed  is  blank  which  indicates  0%  change  in  cost  has  occurred  because  any  change  in 
cost  is  recorded  by  the  other  categories  of  work.  The  percentage  which  is  important  for 
the  decision  maker  to  acknowledge,  though,  is  the  number  indicated  along  the  line 
connecting  the  child  to  parent.  That  percentage,  80.8%,  indicates  what  portion  of  the  total 
actual  cost,  for  the  Arleigh  Burke,  that  OW  accounted  for. 

The  second  peculiarity,  also  a  result  of  definitions,  is  that  the  instances  for  the 
other  three  categories  of  work  are  all  solid  red.  Solid  red  indicates  that  the  baseline,  or 
definitized  estimate  in  this  case,  was  $0  and  the  actual  cost  is  all  cost  growth.  That  is 
because  the  other  three  categories  of  work  (G,  NW,  NG)  all  result  from  work  needed  in 
addition  to  the  OW  and  are,  therefore,  cost  growth  by  definition.  Continuing  with  the 
examination  of  the  Arleigh  Burke,  its  larger  solid  red  child  is  labeled  Growth.  The 
percentage  within  the  instance  is  blank,  but  again,  it  is  less  important.  The  significant 
values  important  to  the  decision  maker  are  the  actual  cost  of  G,  $8.4  million,  and  the 

proportion  of  the  Arleigh  Burke’s  total  actual  cost  which  G  work  accounted  for,  14.5%. 
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With  the  two  peculiarities  defined  and  understood,  reconsider  the  previous  figure  to 
identify  a  cost  driver. 

The  decision  maker  examined  the  largest  All  Other  instance  more  closely  and 
determined  that  subcontractors  were  a  primary  driver  of  cost  growth.  The  decision  maker 
might  then  ask  to  see  the  additional  detail  organized  by  work  to  further  expand  his  or  her 
understanding.  Again  looking  at  the  All  Other  instance  located  at  the  ten  o’clock  position 
in  Figure  33,  the  largest  driver  of  cost  growth  is  NW,  which  accounted  for  $34.4  million, 
or  22%,  of  the  actual  costs  for  All  Other  14  ships.  Combine  the  knowledge  derived  from 
examining  both  graphs  (Definitized  Estimate  versus  Actual,  Top  5  Ships,  Type  Expense 
and  Work,  Eigures  32  and  33,  respectively)  and  the  keen  decision  maker  might  direct 
staff  personnel  to  investigate  NW  performed  by  subcontractors  for  cost  savings 
opportunities. 

Eigure  33  demonstrates  how  costs  aggregate  from  the  bottom  up.  Costs  are 
created  at  the  operational  level  and  occur  in  different  forms.  Here,  the  forms  are 
categorized  according  to  the  classification  of  work  that  created  the  cost.  As  the  costs 
move  from  the  outer  rings  of  the  solar  graph,  they  are  aggregated  into  ship  instances 
which  provide  less  cost  detail  but  is  valuable  as  another  way  of  looking  at  cost.  Einally, 
all  the  ships’  actual  costs  are  aggregated  into  the  center  solar  instance,  Total.  The 
visualization  software  offers  the  opportunity  to  view  the  cost  data  at  many  levels  of 
detail,  each  of  which  delivers  valuable  information  for  decision  makers. 
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Figure  33.  Definitized  Estimate  versus  Aetual,  Top  5  Ships,  Work  Solar  Graph 
(from  J.  Kornitsky,  personal  eommunication,  November,  2013) 
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9. 


Simulation  1  and  2:  Introduction  of  3DP  and  AM  Radical 


Visualization  tools  provide  decision  makers  with  insights  into  historical  data,  and 
more  importantly,  offer  forecasting  capabilities.  Before  implementing  process  changes, 
which  involve  risk  and  uncertainty,  an  executive  could  use  bubble  charts  to  forecast  the 
effects  of  such  changes.  Consider  the  following  example. 

The  executive-level  decision  maker  has  analyzed  the  figures  previously  presented 
and  has  concluded  that  changes  to  the  ship  maintenance  process  are  necessary  to  control 
cost  growth.  Three  technologies  have  been  identified  to  reduce  costs:  3D  Printing  (3DP), 
3D  Laser  Scanning  Technology  (3D  LST),  and  Collaborative  Product  Lifecycle 
Management  (CPLM).  To  test  this  hypothesis,  two  simulations  were  conducted  with 
differing  implementation  strategies.  In  Simulation  1,  we  applied  only  3DP  technology 
while  in  Simulation  2,  all  three  technologies  (3DP,  3D  LST,  and  CPLM  combined)  were 
applied  to  the  ship  maintenance  process.  Simulations  results,  which  could  identify 
potential  cost  savings,  are  discussed  further  in  this  section. 

To  quantify  the  potential  benefits  of  those  technologies,  the  Knowledge  Value 
Added  (KVA)  methodology  was  applied.  KVA  assigns  a  value  to  the  knowledge  assets  of 
an  organization  (Housel  &  Bell,  2001)  and  was  used  to  forecast  the  effect  that  3DP,  3D 
LST,  and  CPLM  technologies  would  have  on  U.S.  Navy  ship  maintenance  programs.  In 
one  prior  study,  3DP  and  CPLM  could  result  in  as  much  as  81%  cost  savings  (Kenney, 
2013).  Another  study  determined  that  as  much  as  84%  cost  savings  could  result  from  the 
use  of  3D  LST  and  CPLM  in  U.S.  Navy  ship  maintenance  programs  (Komoroski,  2005). 
The  potential  impact  of  these  three  technologies  has  been  determined  to  be  substantial. 
Therefore,  they  were  used  to  demonstrate  the  ability  of  the  software  program  to  create 
intuitive  solar  graphs  of  the  cost  savings  generated  by  their  implementation. 

In  the  previous  set  of  comparison  figures,  definitized  estimate  was  the  baseline 
and  actual  cost  was  the  value  compared.  In  the  next  set  of  four  comparison  figures 
(Figures  34-37),  the  baseline  and  the  value  compared  are  changed  to  examine  the  effect 
of  three  different  technologies  on  ship  maintenance  actual  cost.  In  the  first  two  figures  to 
follow  (Figure  34  and  35),  the  actual  cost  is  the  baseline  and  the  forecasted  effect  of  3DP 
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only  is  the  compared  value.  The  next  two  figures  (Figures  36  and  37)  visualize  the  effect 
of  3DP,  3D  LST,  and  CPLM  combined  has  on  ship  maintenance  costs,  labeled  as 
Additive  Manufacturing  (AM)  Radical. 

a.  Actual  versus  3DPfor  the  Top  5  Ships  by  Type  Expense 

Figure  34,  Actual  versus  3DP,  Top  5  Ships,  Type  Expense,  visualizes  the  effect  of 
implementing  3DP  alone  into  the  ship  maintenance  process  on  actual  cost.  The  familiar 
top  5  ship  format  was  maintained  and  the  additional  level  of  detail  is  organized  by  type 
expense.  The  baseline  is  the  actual  cost  incurred  and  the  compared  value  is  the 
backcasted  effect  that  3DP  would  have  had  on  actual  cost. 

To  the  executive-level  decision  maker  analyzing  the  effect  of  3DP  on  U.S.  Navy 
ship  maintenance,  this  figure  provides  two  important  pieces  of  information.  The  first  is 
that  overall,  the  actual  cost  of  ship  maintenance  can  be  reduced  with  the  implementation 
of  3DP  technology.  Figure  34’s  center  instance  shows  that  the  effect  of  3DP  on  the  ship 
maintenance  process  could  have  reduced  the  Total  cost  by  2.1%  as  is  indicated  by  the 
percentage  and  the  green  shell.  The  cost  of  ship  maintenance  with  the  incorporation  of 
3DP  is  now  $426.2  million  versus  the  original  $435.5  million  for  a  savings  of  $9.3 
million.  The  Barry,  again  located  at  the  one  o’clock  position,  is  the  ship  which 
demonstrates  the  largest  percentage  cost  savings  at  6.1%  and  reduced  costs  across  all 
types  of  expense. 

Second,  not  every  ship  may  benefit  from  the  use  of  3DP  technology.  Just  above 
the  three  o’clock  position  on  Figure  34,  the  Stout  indicates  1.4%  cost  growth  for  a 
backcasted  Total  cost  of  $64.1  million,  or  $0.9  million  greater  than  the  original  cost. 
Drilling  down  one  level  of  detail  into  expense  type,  the  decision  maker  can  easily 
determine  that  every  category  of  expense  contributed  to  the  cost  growth  for  the  Stout. 
However,  additional  levels  of  detail  are  available  and  the  executive  may  request  that 
more  information  be  displayed  to  help  identify  the  primary  drivers  of  cost  growth  for  the 
Stout  and/or  the  leading  sources  of  cost  savings  for  the  Barry.  Therefore,  the  next  figure. 
Figure  35,  adds  another  layer  of  detail  organized  by  work. 
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Actual  vs.  3DP,  Top  5  Ships,  Type  Expense 
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Figure  34.  Actual  versus  3DP,  Top  5  Ships,  Type  Expense  Solar  Graph 
(from  J.  Komitsky,  personal  communication,  November,  2013) 
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b.  Actual  versus  3DP  of  the  Top  5  Ships  by  Type  Expense,  Work 

Figure  35,  Actual  versus  3DP,  Top  5  Ships,  Type  Expense,  Work,  is  the  second  in 
the  series  of  comparison  figures.  It  allows  the  decision  maker  to  visually  drill  down  into 
the  data  even  further.  In  the  immediately  previous  figure,  the  Barry  displayed  cost 
savings  across  all  types  of  expense  and  the  Stout  indicated  cost  growth  in  the  same  way 
with  the  implementation  of  3DP  into  the  ship  maintenance  process.  The  additional  level 
of  detail,  however,  indicates  where  each  ship  derived  its  savings  or  growth  with  3DP 
according  to  classification  of  work. 

The  executive  drilling  down  into  the  3DP  backcasted  cost  data  for  the  Barry  can 
quickly  identify  one  classification  of  work,  in  one  type  of  expense,  which  produced  cost 
growth.  The  only  red  shelled  solar  graph  instance  subordinate  to  the  Barry  in  Figure  35  is 
the  NG  instance,  subordinate  to  Sub  Labor,  which  has  been  backcasted  to  account  for 
$1.3  million  dollars  of  Sub  Labor  cost.  However,  the  percentage  growth  is  not  displayed 
because  the  software  limits  the  presence  of  information  to  reduce  clutter  and  increase 
clarity.  But,  the  executive  requiring  more  information  need  only  to  select  the  red-shelled 
NG  instance,  and  more  information  is  immediately  available,  including  the  percentage  of 
cost  growth.  If  the  decision  maker  were  to  decide  to  implement  the  3DP  only  strategy, 
then  the  NG  work  attributed  to  Sub  Labor  may  be  an  aspect  which  should  be  looked  at 
for  improvement. 

The  executive  examining  the  Stout,  at  the  two  o’clock  position  in  Figure  35,  more 
closely  can  quickly  see  that  even  though  the  aggregate  change  in  cost  is  cost  growth, 
there  are  indications  of  possible  cost  savings.  Immediately  subordinate  to  the  Stout,  Sub 
Labor  is  backcasted  to  account  for  $31.3  million.  Again,  the  percentage  increase  in  cost  is 
not  displayed,  but  is  available  by  selecting  the  solar  graph  instance.  Even  though  the  Sub 
Labor  instance  indicates  cost  growth,  there  are  children  subordinate  to  Sub  Labor  that 
signify  cost  savings.  For  example,  the  Growth  instance  is  green  shelled  and  is  backcasted 
to  account  for  $6.4  million.  To  the  decision  maker,  this  figure  is  forecasting  the  possible 
effect  of  implementing  3DP  into  ship  maintenance  using  historical  data,  provides  the 
ability  to  examine  the  effect  a  particular  technology  might  have  on  cost  without  the  risk 

and  uncertainty  involved  with  actual  implementation. 
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Actual  vs.  3DP,  Top  5  Ships,  Type  Expense,  Work 
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Figure  35.  Actual  versus  3DP,  Top  5  Ships,  Type  Expense,  Work  Solar  Graph 
(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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c.  Actual  versus  AM  Radical  of  the  Top  5  Ships  by  Type  Expense 

The  next  two  figures  (Figures  36  and  37)  compare  the  baseline,  actual  cost,  to  the 
backcasted  effect  that  the  implementation  of  all  three  combined  technologies  might  have 
had  on  cost.  The  structure  of  the  graph  remains  familiar,  but  the  increase  in  cost  savings 
demonstrates  the  ability  of  the  software  to  produce  intuitive  solar  graphs  which  easily 
communicate  the  differences  in  effect  on  cost.  Figure  36,  Actual  versus  AM  Radical,  Top 
5  Ships,  Type  Expense  reverts  back  to  the  format  of  Actual  versus  3DP,  Top  5  Ships, 
Type  Expense  (Eigure  34)  with  less  detail,  but  easily  demonstrates  the  difference  in  cost 
savings.  The  substantial  increase  in  cost  savings  is  communicated  by,  most  intuitively, 
the  thickness  of  the  solar  graph  instance  shells,  but  is  also  indicated  by  the  absolute  and 
relative  values  displayed  in  or  near  the  instance. 

To  the  executive  level  decision  maker  concerned  with  cost,  the  most  evident 
display  of  cost  savings  is  the  center  instance.  The  Total  backcasted  cost  of  ship 
maintenance  for  all  19  ships  had  3DP,  3D  EST,  and  CPEM  technologies  combined  been 
implemented  is  $271.1  million  or  37.7%  cost  savings  under  actual  cost.  The  difference, 
$164.4  million,  could  have  been  used  to  finance  other  needs  such  as  system  upgrades, 
structural  improvements,  or  reducing  the  number  of  maintenance  jobs  deferred  until  the 
next  availability  due  to  shrinking  fiscal  budgets.  The  decision  maker  analyzing  the 
change  in  cost  might  also  be  interested  in  understanding  the  difference  in  cost  savings  of 
individual  ships. 

In  contrast  to  the  implementation  strategy  of  3DP-only,  which  slightly  increased 
cost  for  one  of  the  top  5  ships,  AM  Radical  decreased  costs  for  all  top  5  ships.  There 
appears  to  be  substantial  cost  savings  in  the  All  Other  solar  graph  instance  located  at  the 
ten  o’clock  position  in  Eigure  36  as  well,  but  current  settings  prevent  concluding  that  all 
19  ships  incurred  cost  savings.  In  the  case  of  the  Barry,  cost  savings  is  significantly 
increased.  With  implementation  of  3DP-only,  the  backcasted  cost  was  $65.8  million,  or 
6.1%  cost  savings.  With  the  use  of  all  three  technologies,  or  AM  Radical  implementation, 
the  backcasted  cost  for  the  Barry  is  $43.9  million,  a  cost  savings  of  37.3%  when 
compared  with  actual  cost.  Drilling  down  one  layer  of  detail,  two  of  the  type  expense 

children  subordinate  to  the  Barry  have  thicker  green  shells  than  the  others,  an  intuitive 
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indication  of  substantial  cost  savings.  In  fact,  Sub  Labor  and  Sub  Material  account  for 
almost  80%  of  the  increase  in  the  cost  savings  of  AM  Radical  over  the  3DP-only 
implementation  strategy  for  the  Barry. 

Earlier,  in  the  description  of  the  Definitized  Estimate  versus  Actual,  Top  5 
Ships,  Type  Expense  (Eigure  32)  solar  graph,  the  executive  level  decision  maker 
identified  subcontractor  labor  and  material  as  primary  drivers  of  cost  growth.  The  keen 
decision  maker  might  begin  to  formulate  that  a  possible  solution  to  subcontractor  labor 
and  material  cost  growth  is  the  implementation  of  all  three  technologies.  However,  the 
addition  of  another  layer  of  detail  is  available  and  it  may  provide  either  supporting  or 
contradictory  evidence. 
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Actual  vs,  AM  Radical, Top  5  Ships,  Type  Expense 
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Figure  36.  Actual  versus  AM  Radical,  Top  5  Ships,  Type  Expense  Solar  Graph 
(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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d.  Actual  versus  AM  Radical  of  the  Top  5  Ships  by  Type  Expense,  Work 

Figure  37,  Actual  versus  AM  Radical,  Top  5  Ships,  Type  Expense,  Work,  the 
fourth  and  final  solar  graph  of  this  comparison  series,  allows  the  executive  level  decision 
maker  to  visually  drill  down  into  the  data  even  further.  The  additional  layer  of  detail  is 
organized  by  work  and  provides  more  information  about  the  sources  of  cost  savings. 

An  executive  analyzing  this  figure  would  notice  the  most  obvious  aspect  first,  the 
fact  that  AM  Radical  implementation  creates  cost  savings  throughout  the  entire  data  set. 
Whereas  3DP-only  implementation  indicated  cost  growth  in  one  ship,  various  type 
expenses  and  classifications  of  work,  the  backcasted  effect  AM  Radical  implementation 
could  have  produces  cost  savings  in  every  instance.  For  example,  with  3DP-only 
implementation,  the  Actual  versus  3DP,  Top  5  Ships,  Type  Expense,  Work  solar  graph 
(Figure  35)  identified  cost  growth  in  one  classification  of  work,  NG,  which  accounted  for 
$1.3  million  of  Sub  Eabor.  But  with  AM  Radical  implementation,  the  NG  instance 
subordinate  to  the  Barry  on  this  solar  graph.  Figure  37,  indicates  cost  savings  and  now 
accounts  for  $0.99  million.  As  stated  before,  the  percentage  change  is  not  displayed  to 
reduce  clutter;  however,  it  is  available  by  simply  selecting  the  instance.  Possibly  more 
interesting  to  the  executive  level  decision  maker  is  the  case  of  the  Stout  which  changed 
from  a  source  of  cost  growth  to  a  significant  driver  of  cost  savings. 

In  the  previous  solar  graph.  Actual  versus  3DP,  Top  5  Ships,  Type  Expense, 
Work  (Figure  35),  showing  the  backcasted  effect  of  3DP,  the  Stout  displayed  an  absolute 
cost  of  $64.1  million  and  cost  growth  of  1.4%.  The  classification  of  work  which 
contributed  most  the  cost  of  the  Stout  was  OW,  a  child  of  Sub  Eabor,  and  indicated  an 
absolute  cost  of  $21.5  million.  But  with  AM  Radical  implementation,  this  solar  graph 
(Figure  37)  backcasted  the  cost  to  have  been  $15.3  million  for  the  OW  associated  with 
Sub  Eabor,  a  cost  savings  of  $6.2  million  when  compared  to  3DP-only  implementation. 

The  Stout,  as  well  as  the  other  top  5  ships,  could  have  produced  significant 
cost  savings  had  the  AM  Radical  approach  been  implemented.  However,  the  actual  costs 
have  already  been  incurred.  The  significance  of  this  series  of  figures  is  that  a  decision 
maker  can  visualize  the  effect  the  technology  implementation  strategies  might  have  had 
on  historical  data  and  then  make  predictions  about  the  effect  on  future  costs.  The  decision 
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maker,  armed  with  the  predictions  derived  from  the  solar  graphs,  weighs  additional 
aspects  of  executive  level  organizational  considerations,  and  then  is  able  to  make  better 
cost  control  choices  for  the  future  of  U.S.  Navy  ship  maintenance. 
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Actual  vs.  AM  Radical, Top  5  Ships,  Type  Expense,  Work 
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Figure  37.  Actual  versus  AM  Radical,  Top  5  Ships,  Type  Expense,  Work  Solar  Graph 
(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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10.  Alternative  Figures 

The  final  series  of  solar  graph  figures  (Figures  38-40)  demonstrate  the  flexibility 
of  visualization  tools,  enabling  drilling  down  into  specific  details.  All  but  the  first  of  the 
figures  described  thus  far  have  used  the  top  5  ships  structuring  concept  for  the  first  level 
of  detail.  While  the  use  of  the  single  method  of  organizing  the  first  layer  of  detail  made 
comprehension  of  the  graphs  easier,  it  limited  the  appearance  of  the  flexibility  of  the  third 
party  software.  Therefore,  other  methods  of  organizing  and  presenting  the  data  are 
explored  in  the  section  of  three  comparison  figures. 

a.  Definitized  Estimate  versus  Actual  of  the  Type  Expense  by  Work 

Figure  38,  Definitized  Estimate  versus  Actual,  Type  Expense,  Work,  is  useful  for 
the  decision  maker  interested  in  analyzing  cost  growth  without  discriminating  by  ship. 
This  figure  reverted  back  to  using  the  definitized  estimate  as  the  baseline  and  the  actual 
cost  as  the  comparison,  as  in  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Type 
Expense  and  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Work  (Eigures  32  and  33, 
respectively).  However,  the  top  5  ships  are  not  used  as  an  organizing  concept.  Instead,  the 
first  layer  of  detail  is  grouped  by  type  expense  and  the  additional  layer  is  organized  by 
work. 

Consider  the  theory  arrived  at  by  the  executive  during  analysis  of  the  Definitized 
Estimate  versus  Actual,  Top  5  Ships,  Type  Expense  figure  (Eigure  32).  The  decision 
maker  noted  that  subcontractor  labor  and  material  appeared  to  be  primary  drivers  of  cost 
growth.  In  this  solar  graph,  Eigure  38,  the  Sub  Labor  instance  appears  at  ten  o’clock  and 
the  Sub  Material  instance  at  three  o’clock.  The  indicated  percentages  of  cost  growth  are 
45%  and  44%,  respectively.  Compared  to  the  cost  growth  of  Labor  and  Materials 
associated  with  the  shipyard,  28.5%  and  27.1%,  respectively,  subcontractors  do  also 
appear  here  to  be  primary  drivers  of  cost  growth.  The  decision  maker  is  interested  in 
understanding  the  causes  of  subcontractor  cost  growth  at  a  deeper  level  of  detail. 
Therefore,  the  executive  might  analyze  the  graph  further  and  discover  that  NW  is  the 
largest  absolute  contributor  to  both  Sub  Labor  at  $31.5  million  and  Sub  Material  at  $13.5 
million. 
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The  same  information  was  derived  from  the  analysis  of  two  sequential  solar 
graphs  described  earlier.  Those  were  Definitized  Estimate  versus  Actual,  Top  5  Ships, 
Type  Expense  and  Definitized  Estimate  versus  Actual,  Top  5  Ships,  Work  (Eigures  32 
and  33).  The  same  understanding  was  derived  from  two  unique  presentations,  one  with 
two  graphs  and  the  other  with  this  one  graph.  Arriving  at  the  same  conclusion  from 
different  presentations  of  the  data  builds  confidence  in  the  decision  maker  that  the  data 
are  accurate  and  the  visualization  methods,  valid. 
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Definitized  Estimate  vs.  Actual,  Type  Expense,  Work 
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Figure  38.  Definitized  Estimate  versus  Aetual,  Type  Expense,  Work  Solar  Graph 
(from  J.  Kornitsky,  personal  communieation,  November,  2013) 
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b.  Definitized  Estimate  versus  Actual  of  the  Work  by  Ship 

The  remaining  two  alternative  figures  are  complementary.  Figure  39,  Definitized 
Estimate  versus  Actual,  Work,  Ship,  is  a  figure  that  a  decision  maker  could  use  to 
identify  problem  areas  of  cost  growth  based  on  classification  of  work.  Figure  40,  Actual 
versus  AM  Radical,  Work,  Ship,  keeps  the  same  organization  format,  but  enables  the 
decision  maker  to  analyze  how  the  implementation  of  the  three  technologies  could  have 
created  cost  savings. 

In  Figure  39,  which  demonstrates  cost  growth,  there  is  only  one  peculiarity 
already  explained.  All  the  thin,  green-shelled  instances  on  the  left  side  of  the  graph 
represent  0%  growth  because  of  the  definition  of  OW  which  cannot  grow  in  expense. 
Also,  the  solid,  red  shelled  instances  on  the  right  side  represent  only  cost  growth  that 
occurred  and  are  classified  as  either  NW,  G,  or  NG  because  their  definitions. 

Figure  39  is  important  to  the  executive-level  decision  maker  because  it  exhibits 
data  already  presented  in  another  format.  The  other  format  was  the  Definitized  Estimate 
versus  Actual,  Top  5  Ships,  Work  figure  (Figure  33),  which  organized  the  first  level  of 
detail  by  ship  and  the  second  level  by  work.  In  this  graph,  the  organizing  concepts  have 
been  reversed.  If  the  same  deduction  can  be  derived  from  this  solar  graph,  then  the 
decision  maker’s  confidence,  in  their  ability  to  make  accurate  and  valid  choices  for  the 
future  of  U.S.  Navy  ship  maintenance  processes,  increases. 

The  deduction  already  made  by  the  decision  maker  was  that  NW,  over  the  other 
classifications  of  work,  accounted  for  the  largest  portion  of  cost  growth.  Referring  to  the 
Definitized  Estimate  versus  Actual,  Work,  Ship  solar  graph  (Figure  39),  a  quick  visual 
scan  over  the  classification  of  work  instances  creates  an  intuitive  understanding.  The  NW 
instance  is  the  largest  indicator  of  cost  growth.  Further  examination  by  the  decision 
maker  provides  the  dollar  values  which  support  the  intuitive  perception.  The  NG 
instance,  located  at  the  five  o’clock  position,  accounted  for  $7.7  million.  The  G  instance, 
located  just  below  the  three  o’clock  position,  represented  $47.1  million.  Finally,  the  NW 
instance,  located  at  the  two  o’clock  position,  produced  $66.8  million  in  cost  growth.  Even 
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though  the  data  were  organized  and  presented  differently,  the  same  deduction  was 
reached,  NW  was  the  primary  driver  of  cost  growth. 

If  the  executive  were  interested  in  determining  the  ships  that  produced  the  largest 
cost  growth,  then  simply  referring  to  the  additional  level  of  detail  would  provide  the 
answer.  For  example,  since  NW  was  the  primary  driver  of  cost  growth,  identification  of 
the  largest  contributing  ship  may  provide  a  specific  case  for  further  analysis  of  cost 
growth.  Referring  to  the  NW  instance,  located  at  the  two  o’clock  position  in  Figure  39, 
the  child  ship  which  represents  the  largest  portion  of  cost  growth  is  the  Donald  Cook.  The 
decision  maker,  remembering  that  the  Donald  Cook  represents  two  availabilities,  would 
drill  down  into  the  next  level  of  detail  by  selecting  the  Donald  Cook.  Then,  the 
determination  would  be  made  whether  either  one  of  the  Donald  Cook  availabilities,  or  the 
next  largest  individual  ship  (the  Barry)  was  the  ship  representing  the  most  cost  growth  for 
NW.  Once  the  ship  was  identified,  then  the  executive  could  direct  further  study  into  the 
causes  of  cost  growth. 
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Definitized  Estimate  vs.  Actual,  Work,  Ship 
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Figure  39.  Definitized  Estimate  versus  Aetual,  Work,  Ship  Solar  Graph 
(from  J.  Kornitsky,  personal  eommunication,  November,  2013) 
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c.  Actual  versus  AM  Radical  of  the  Work  by  Ship 

Actual  versus  AM  Radical,  Work,  Ship,  (Figure  40)  shows  the  backcasted  effect 
that  the  implementation  of  all  three  technologies  combined  might  have  had  on  U.S.  Navy 
ship  maintenance  costs.  The  Figure  maintains  the  organizing  structure  of  the  immediately 
previous  solar  graph  to  provide  easy  comparison  for  the  executive  level  decision  maker. 

For  example,  the  decision  maker  is  interested  in  figuring  out  the  overall  effect  that 
AM  Radical  implementation  compared  to  definitized  cost.  The  center  instance  in  Figure 
40,  Total,  indicates  the  bottom  line  cost  savings  that  may  have  occurred  had  the  AM 
Radical  implementation  strategy  been  employed.  At  $271.1  million,  AM  Radical 
implementation  might  have  resulted  in  37.7%  cost  savings,  but  that’s  compared  to  actual 
cost.  Referring  to  the  OW  instance  located  at  the  nine  o’clock  position  on  the  previous 
solar  graph,  Definitized  Estimate  versus  Actual,  Work,  Ship,  (Figure  25)  the  value  is 
$313.7  million.  Because  of  the  definition  of  OW  and  the  position  of  the  OW  instance  at 
the  first  level  of  detail,  it  also  represents  the  total  definitized  estimate.  Simple  math  shows 
that  AM  Radical  implementation  might  have  caused  the  ships  analyzed  to  come  in  under 
budget  by  $42.6  million,  or  13.6%.  Cost  growth  was  turned  into  cost  savings  through  the 
backcasted  effect  that  AM  Radical  implementation  might  have  had  on  the  ships  studied. 
To  the  executive  level  decision  maker,  this  is  important  because  if  the  three  technologies 
(3DP,  3D  LST,  and  CPLM  combined)  were  selected  for  implementation,  then  future  U.S. 
Navy  ship  maintenance  budgets  might  be  reduced  and  result  in  reallocation  of  funding  to 
higher  priority  projects. 
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Actual  vs.  AM  Radical,  Work,  Ship 


Figure  40.  Actual  versus  AM  Radical,  Work,  Ship  Solar  Graph 

(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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11.  LOD  and  Availability  Density  Bubble  Charts 

Lost  operating  days  (LOD)  have  long  been  considered  by  the  U.S.  Navy  ship 
maintenance  metrics  groups  to  be  a  valuable  indication  of  the  performance  of  the  ship 
maintenance  process.  The  LOD  metric  is  often  included  in  reports  made  by  regional 
maintenance  centers  (RMC)  to  Naval  Sea  Systems  Command  (NAVSEA)  as  an 
indication  of  the  effect  on  ship’s  schedule  which  delays  have  caused  (M.  Leftwich, 
Personal  Communication,  September  4,  2013).  However,  the  LOD  metric  has  been  linked 
to  the  quality  of  the  definitized  estimate  and  that  quality  has  been  determined  to  be 
random  (T.  Laverghetta,  Personal  Communication,  November  26,  2013).  To  the 
executive  level  decision  maker,  the  important  aspect  of  ship  maintenance  is  cost.  It  was 
suggested  that  availability  density  be  considered  an  alternative  metric  for  predicting  cost 
and  was  provided  to  this  study  for  further  analysis  (P.  Pascanik,  Personal 
Communication,  November  21,  2013).  Of  the  following  three  figures  (Figures  41-43),  the 
first  two  highlight  the  lack  of  correlation  between  the  LOD  metric  and  actual  cost.  The 
third  demonstrates  the  validity  of  using  the  availability  density  metric  to  indicate  actual 
maintenance  cost. 

The  following  three  figures  are  structured  as  an  XY  scatter  plot.  The  X- 
axis  represents  expense  or  actual  cost  of  a  ship  availability  and  ranges  from  $0  to  $70 
million.  The  Y-axis  represents  the  total  LODs  incurred  during  an  availability  and  ranges 
from  0  to  (-107),  negative  to  represent  operating  days  lost.  The  data  points  scattered 
throughout  the  chart  represent  the  LOD  and  expense  values  for  the  individual 
availabilities  and  are  labeled  with  their  unique  availability  identification  numbers.  For 
example,  the  data  point  labeled  Avail  56387  in  Figure  41,  near  the  center  of  the  bubble 
chart  represents  one  of  the  availabilities  for  the  Donald  Cook.  The  FODs  incurred  during 
that  availability  totaled  63  and  the  total  expense  was  $33.4  million. 

a.  LOD  versus  Expense  (Actual  Cost) 

Figure  41,  FOD  versus  Expense  (Actual  Cost),  is  presented  first  to  provide 
an  introduction  to  the  structure  of  the  bubble  chart.  The  next  figure,  FOD  versus  Expense 
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(Actual  Cost) —  Highlighted  (Figure  42),  a  cluster  of  availabilities  not  correlated  with  the 
rest,  therefore  should  be  considered  important  to  the  U.S.  Navy  ship  maintenance 
executive  level  decision  maker  interested  in  controlling  costs. 
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Figure  41.  LOD  versus  Expense  (Actual  Cost)  Bubble  Chart 

(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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b.  LOD  versus  Expense  (Actual  Cost) — Highlighted 

Figure  42,  LOD  versus  Expense  (Actual  Cost)  -  Highlighted,  is  important 
to  the  executive-level  decision  maker  because  it  demonstrates  that  the  LOD  metric  is  not 
correlated  to  forecasting  the  actual  cost  of  an  availability.  This  is  shown  by  both  a  visual 
analysis  of  the  chart  and  mathematically  by  calculation  of  the  correlation  factor. 

Visually,  the  data  points  show  that  smaller  availabilities,  less  than  $30 
million,  can  result  in  either  the  highest  number  or  the  lowest  number  of  LODs.  For 
example,  the  data  point  labeled  Avail  52371  in  Figure  42  is  for  the  James  E.  Williams 
and  indicates  an  actual  cost  of  $4.2  million  with  a  total  of  0  LODs.  Meanwhile,  the  data 
point  labeled  Avail  57133  is  for  one  of  the  Arleigh  Burke  availabilities  and  indicates  an 
actual  cost  of  $9.1  million  with  a  total  of  107  LODs.  In  fact,  the  six  data  points 
highlighted  in  the  lower  left  corner  of  the  bubble  chart  all  represent  availabilities  of 
relatively  small  cost  that  incurred  relatively  high  numbers  of  LODs,  which  prevent  the 
appearance  of  a  linear  relationship.  Therefore,  the  LOD  metric  is  not  a  good  indicator  of 
availability  cost. 

Mathematical  calculation  also  demonstrates  the  lack  of  connection 
between  the  LOD  metric  and  expense.  The  expense,  or  actual  cost,  of  each  availability 
was  totaled,  to  include  OW,  G,  NW,  and  NG.  Then,  a  correlation  factor  was  calculated 
between  the  cost  of  each  availability  and  the  number  of  LODs  incurred  during  each 
availability.  The  correlation  factor  is  (-0.14).  This  number  shows  that,  mathematically, 
the  LOD  metric  is  not  a  good  indicator  of  cost.  For  the  executive-level  decision  maker, 
LODs  have  been  visually  and  mathematically  shown  not  to  correlate  well  with  cost. 
However,  the  metric  which  correlates  well  with  cost  is  the  metric  provided  to  this  study 
for  further  analysis. 
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LOD  vs.  Expense  (Actual  Cost)  -  Highlighted 
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Figure  42.  LOD  versus  Expense  (Actual  Cost)  -  Highlighted  Bubble  Chart 
(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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c.  Availability  Density  versus  Expense  (Actual  Cost) 

The  metric  of  availability  density  was  provided  to  this  study  and  is  defined 
as  the  Total  Actual  Man  Days  divided  by  the  Total  Availability  Duration  Days.  In  other 
words,  the  availability  density  number  represents  the  average  number  of  man-days 
performed  each  calendar  day  of  the  availability.  For  example,  in  Figure  43,  the  Stout 
(Avail  54703)  used  134,254  man  days  to  complete  its  availability  which  lasted  275 
calendar  days.  The  availability  density  for  the  Stout  is  488.  For  each  calendar  day  of  the 
Stout’s  availability,  an  average  of  488  man-days  were  performed. 

Figure  43,  Availability  Density  versus  Expense  (Actual  Cost),  changes  one 
axis  to  represent  the  new  metric.  The  X-axis  remains  as  expense,  but  the  Y-axis  now 
represents  availability  density  and  ranges  from  55  to  611.  For  example,  the  Stout  (Avail 
54703)  data  point  near  the  top  right  comer  of  the  chart,  indicates  an  average  of  488  man- 
days  per  availability  calendar  day  and  an  actual  cost  of  $63.2  million. 

A  case  can  be  made  that  availability  density  is  a  better  indicator  of  cost 
and  the  Availability  Density  versus  Expense  (Actual  Cost)  bubble  chart  (Figure  43) 
supports  that  both  visually  and  mathematically.  Visually,  availability  density  correlates 
with  expense.  For  example,  the  data  point  labeled  Avail  57133  near  the  bottom  left 
portion  of  the  chart,  represents  one  of  the  Arleigh  Burke  availabilities  and  indicates  an 
availability  density  of  85  and  expense  of  $9.1  million.  In  the  diagonally  opposite  corner. 
Avail  54318  represents  the  Barry  and  indicates  an  availability  density  of  611  and  expense 
of  $70. 1  million.  Visually,  availability  density  provides  a  good  indication  of  availability 
expense  as  can  be  seen  by  its  linear  response. 

Mathematically,  availability  density  and  cost  correlate  very  well.  The 
expense  of  each  availability  was  again  totaled.  Then,  a  correlation  factor  was  calculated 
between  the  cost  and  availability  density  of  each  availability.  The  correlation  factor  is 
0.98.  This  number  shows  that,  mathematically,  the  availability  density  metric  is  a  strong 
indicator  of  cost.  Availability  density  is  visually  and  mathematically  an  accurate  indicator 
of  cost. 

For  the  executive  level  decision  maker,  predicting  the  actual  cost  of  events 
in  progress  is  extremely  valuable.  The  metric,  availability  density,  shows  such  a  strong 
correlation  to  cost  that  it  may  be  able  to  predict  whether  a  particular  current  availability  is 
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expected  to  meet  or  exceed  the  definitized  estimate.  The  ability  to  predict  the  ending 
actual  cost  of  a  ship  maintenance  evolution  in  progress  would  enable  decision  makers  to 
avert  large  cost  growth  by  implementing  changes  earlier  in  the  U.S.  Navy  ship 
maintenance  process. 
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Availability  Density  vs.  Expense  (Actual  Cost) 
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Figure  43.  Availability  Density  versus  Expense  (Actual  Cost)  Bubble  Chart 
(from  J.  Kornitsky,  personal  communication,  November,  2013) 
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12.  Drill  Down  Spreadsheets 

The  next  two  figures  (Figures  44  and  45)  are  examples  of  drill  down  spreadsheets 
that  can  be  selected  through  any  solar  graph  instance.  The  ship  selected  as  the  target  for 
drill  down  was  the  Barry  and  a  time-analysis  spreadsheet  was  produced  at  two  different 
levels  of  detail.  The  time  analysis  covers  the  actual  cost,  3DP-only  implementation 
backcasted  cost,  and  AM  Radical  implementation  backcasted  cost.  The  titles  of  each 
spreadsheet  indicate  the  levels  of  detail  shown;  Barry  Drill  Down,  3  Levels  of  Detail 
(Figure  44)  shows  three  levels  of  detail  and  Barry  Drill  Down,  4  Levels  of  Detail  (Figure 
45)  shows  four. 

These  spreadsheets  would  be  valuable  to  the  executive-level  decision  maker  that 
wanted  to  see  the  numbers  that  the  visualization  software  translates  into  intuitive  solar 
graphs.  For  example,  consider  the  executive  analyzing  the  Actual  versus  AM  Radical, 
Top  5  Ships,  Type  Expense  solar  graph.  Figure  36.  The  Barry  instance,  located  at  the  one 
o’clock  position  on  that  graph,  indicates  a  backcasted  absolute  cost  of  $43.9  million  if  all 
three  technologies  had  been  implemented  into  the  ship  maintenance  process.  However, 
the  decision  maker  wants  to  see  the  absolute  values  for  the  actual  cost,  the  3DP-only 
backcasted  cost,  and  the  AM  Radical  backcasted  cost  together  for  comparison.  Then, 
simple  selection  of  the  Barry  instance  would  produce  the  option  to  generate  detailed 
spreadsheets  at  varying  levels  of  detail.  If  the  decision  maker  only  wanted  to  see  a  little 
additional  detail,  then  the  Barry  Drill  Down,  3  Levels  of  Detail  (see  Figure  44)  option 
might  be  selected.  If  the  decision  maker  really  wanted  to  drill  down  into  the  data,  then  the 
Barry  Drill  Down,  4  Levels  of  Detail  (see  Figure  45)  spreadsheet  could  be  generated. 
Either  way,  these  spreadsheets  provide  the  executive-level  decision  maker  with  drill 
down  capability,  sufficient  to  meet  the  needs  of  the  most  detail-oriented  executive. 
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Barry  Drill  Down,  3  Levels  of  Detail 
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Figure  44.  Barry  Drill  Down,  3  Levels  of  Detail  Drill  Down  Spreadsheet 
(from  J.  Komitsky,  personal  communication,  2013) 
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Barry  Drill  Down,  4  Levels  of  Detail 
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2.16% 

-56.39%  ' 

2,183,255 

1,968,201 

951,907 

TOM 

Total  Orignal  Material 

3 

5,259,017  ^ 

11.96% 

•61.63%  ' 

13,708,911 

12,319,508 

5,259,017 

OM 

Original  Material 

4 

1,338,166  , 

3.04% 

-57.00%  ' 

3,112,015 

2,676,332 

1,338,166 

OSM 

Original  SubMatenal 

4 

3,920,851  ^ _ 

8.92% 

■63.00%  ' 

10,596,896 

9,643,175 

3,920,851 

TG 

Total  Inaease  (G4NW-tNG) 

1 

13,879,789  || 

31.58% 

-37.25%  ^ 

22,121,268 

20,718,678 

13,879,769 

TO 

Total  Orignal 

1 

30,062,131 

68.41% 

-37.40%  ^ 

48,024,491 

45,111,101 

30,062,131 

***  %Growth  is  AM  Radical  vs.  Actual 

Figure  45.  Barry  Drill  Down,  4  Levels  of  Detail  Drill  Down  Spreadsheet 
(from  J.  Komitsky,  personal  communication,  2013) 
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F.  SUMMARY 

Visualization  tools  make  it  easier  for  executive-level  decision  makers  to 
determine  the  status,  path,  and  origin  of  ship  maintenance  costs.  NFS  researchers  were 
requested  to  identify  new  ways  of  summarizing  millions  of  data  points  that  are  critical  in 
making  maintenance  decisions.  The  visualization  software  illustrated  how  additional 
tools  for  big  data  could  provide  the  diagrams,  charts,  and  graphs  to  facilitate  maintenance 
costs  allocation  decisions.  In  addition,  we  have  provided  a  methodology  to  help  mitigate 
the  risk  and  uncertainty  in  decision  making. 

The  big  data  collected  and  stored  by  STIMS  is  a  giant  trove  of  information.  In 
this  limited  study,  only  19  DDGs  and  21  availabilities  were  analyzed.  Primary  drivers  of 
cost  growth  and  possible  sources  of  cost  savings  were  identified.  Consider  the  use  of  big 
data  visualization  methods  for  not  just  every  ship  in  the  U.S.  Navy,  but  also  in  the  U.S. 
Coast  Guard.  These  methods  could  also  be  applied  to  aircraft  and  ground  vehicle 
maintenance.  The  scope  is  expandable  to  any  system  which  collects  big  data.  The  ability 
to  intuitively  analyze  large  amounts  of  information  and  gain  deeper  understanding  of  the 
relationships  among  the  aspects  of  an  entire  system  are  what  makes  big  data  visualization 
so  important  for  everyone,  including  U.S.  Navy  ship  maintenance  executive-  level 
decision  makers. 
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V.  CONCLUSIONS  AND  RECOMMENDATIONS 


A.  CONCLUSIONS 

PEO  SHIPS  asked  the  team  from  NPS  to  work  with  U.S.  Navy  ship  maintenance 
metrics  groups  to  provide  additional  options  regarding  the  optimization  of  large  data  sets. 
The  use  of  static,  cumbersome  spreadsheets  are  no  longer  suitable  for  executive-level 
decision  makers  to  make  strategic  choices  regarding  ship  maintenance  budgeting  and 
scheduling  because  a  better  option  is  available.  Big  data  visualization  was  the  technique 
chosen  for  analysis  because  of  its  ability  to  create  higher  levels  of  visual  clarity  through 
diagrams,  graphs,  and  charts.  The  visualization  software  used  to  present  ship  maintenance 
big  data  provides  a  means  to  aggregate  voluminous  data  in  visually  intuitive  ways  to 
better  understand  cost  drivers  and  factors  which  lead  to  schedule  overruns.  Big  data 
visualization  allows  decision  makers  to  identify  trends  quickly,  develop  a  better 
understanding  of  the  problem  space,  establish  defensible  baselines  for  monitoring 
activities,  perform  forecasting,  and  evaluate  useful  metrics. 

The  visualization  software  provides  decision  makers  with  tools  that  make  quick 
identification  of  trends  possible.  Refer  to  Figures  32  and  33  (Definitized  Estimate  vs. 
Actual,  Top  5  Ships,  Type  Expense  and  Work,  respectively)  or  Figure  38  (Definitized 
Estimate  vs.  Actual,  Type  Expense,  Work).  In  the  example  scenarios  presented,  an 
executive  level  decision  maker  was  interested  in  identifying  the  largest  cost  contributor. 
Visual  analysis  of  the  figures  led  the  decision  maker  to  quickly  identify  that 
subcontractor  labor  resulting  from  NW  caused  a  trend  of  higher  costs.  Quick 
identification  of  factors  leading  to  higher  costs  is  an  example  result  of  the  use  of  big  data 
visualization  tools. 

Better  understanding  of  the  problem  space  is  also  provided  by  the  visualization  of 
big  data.  Before  a  decision  maker  can  make  choices  about  the  future  of  U.S.  Navy  ship 
maintenance,  they  must  be  able  to  understand  the  characteristics  of  the  problem  as  a 
whole.  Charts,  diagrams,  and  solar  graphs  enable  executives  to  visualize  how  all  the 
datum  points  relate  to  each  other,  to  define  which  categories  of  data  are  of  particular 
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interest,  and  forecast  the  impact  of  policy  changes.  The  visualization  of  big  data  through 
visualization  tools  permits  decision  makers  to  develop  a  better  understanding  of  their 
specific  problem  space. 

Continued  collection  of  ship  maintenance  big  data  would  provide  for  the  creation 
of  defensible  cost  and  schedule  performance  baselines.  The  sample  data  analyzed  in  this 
project  represented  one  type  of  ship  and  a  small  number  of  availabilities  and  is,  therefore, 
limited  in  its  ability  to  represent  U.S.  Navy  ship  maintenance  as  an  industry.  For  the 
scope  of  this  project,  the  sample  size  was  sufficient  to  demonstrate  the  value  of  big  data 
visualization  tools.  However,  to  more  accurately  reflect  the  U.S.  Navy  ship  maintenance 
industry,  expanded  and  continued  collection  of  ship  maintenance  big  data  is  necessary.  If 
the  collection  of  data  were  expanded  to  include  all  types  of  ships  and  continued  to 
provide  for  the  analysis  of  many  years,  then  the  visualization  software  could  be  used  to 
create  defensible  cost  and  schedule  performance  baselines. 

Executive-level  decision  makers  are  often  concerned  with  the  future  impact  of 
their  current  policy  change  choices.  Historically,  executives  relied  upon  the  advice  of 
experts  and  instincts  developed  over  several  years  of  personal  experience  to  select  which 
policy  changes  would  create  the  effects  desired.  Through  big  data  visualization  tools, 
manipulation  of  the  data  is  possible  to  allow  for  forecasting.  In  the  simulations,  which 
examined  the  implementation  of  either  3DP  technology  only  or  the  combination  of 
multiple  technologies  (3DP,  3D  LST,  and  CPLM),  cost  savings  trends,  derived  from 
previous  research  of  those  technologies,  were  applied  to  historical  ship  maintenance  data. 
The  results  were  graphically  presented  in  a  manner  which  allowed  a  decision  maker  to 
intuitively  understand  the  forecasted  effect  without  the  need  for  expensive  test  cases  or 
extensive  research  by  experts.  These  graphs  can  be  seen  as  screen  shots  from 
visualization  software. 

Metrics  provide  an  indication  of  performance  as  long  as  they  represent  a  causal 

relationship.  LODs  have  long  been  used  as  a  metric  to  indicate  ship  maintenance 

performance,  but  its  validity  was  questioned  because  it  was  determined  to  be  linked  to  the 

quality  of  the  definitized  estimate,  a  random  factor.  Availability  density  was  offered  as  an 

alternative  metric,  but  proof  of  its  validity  was  necessary  before  being  considered  as  a 
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real  substitute  for  the  LOD  metric.  Through  the  use  of  bubble  charts,  the  visualization 
software  created  a  visually  intuitive  display  which  demonstrated  the  correlation  to 
expense  of  each  of  the  metrics.  The  LOD  metric  was  shown  to  be  a  poor  indicator  of  cost 
and  the  availability  density  metric  was  shown  to  be  a  good  indicator  of  cost. 

Through  the  use  of  big  data  visualization  tools,  executive-level  decision  makers 
can  identify  trends  quickly,  develop  a  better  understanding  of  the  problem  space, 
establish  defensible  baselines  for  monitoring  activities,  perform  forecasting,  and  evaluate 
useful  metrics. 

B.  RECOMMENDATIONS 

The  visualization  of  big  data  is  beneficial  to  executive-level  decision  makers 
responsible  for  implementing  policy  throughout  their  enterprise.  For  U.S.  Navy  ship 
maintenance  decision  makers  desiring  ways  to  improve  the  speed  and  accuracy  of  their 
decisions,  they  should  consider  the  use  of  visualization  software  in  their  industry.  The 
following  additional  recommendations  are  made  to  optimize  the  use  of  big  data 
visualization  in  ship  maintenance: 

•  Continued  collection  of  data.  Data  which  reflects  ship  maintenance  over 
time  will  provide  greater  value  and  more  defensible  baselines. 

•  Expanded  collection  of  data.  Data  which  reflects  all  types  of  ships  in  the 
U.S.  Navy  would  better  reflect  the  industry  and  better  characterize  the 
problem  space. 

•  Identify  performance  accounting  software  for  tracking.  Software  packages 
are  available  which  would  provide  for  a  systematic,  common,  and 
seamless  method  for  collecting,  storing,  and  analyzing  performance  data. 

•  Begin  forecasting  once  more  accurate  performance  baselines  are 
established.  Forecasting  the  effects  of  policy  decisions  is  only  as  accurate, 
and  therefore  valuable,  as  the  baselines  used  to  derive  the  forecast. 
Continued  and  expanded  collection  of  data  in  a  common  software  package 
over  a  period  of  time  must  be  accomplished  before  value  can  be  obtained 
through  forecasting. 

In  addition  to  the  visualization  of  big  data,  U.S.  Navy  ship  maintenance  decision 
makers  would  also  benefit  from  the  development  of  a  meaningful  numerator  for 
evaluating  ship  maintenance  performance.  Return  on  investment  (ROI)  is  calculated  by 
dividing  the  output  by  the  input.  U.S.  Navy  ship  maintenance  collects  troves  of  data  on 
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the  input,  the  denominator,  in  the  form  of  dollars  of  cost.  However,  there  is  no  recorded 
output,  or  benefit,  derived  from  ship  maintenance  which  is  collected  as  a  metric  and 
represented  in  generic  units  of  output.  Without  a  numerator,  the  ROI,  the  return  on 
taxpayer  investment,  of  U.S.  Navy  ship  maintenance  cannot  be  determined. 

Through  the  implementation  of  these  recommendations,  U.S.  Navy  ship 
maintenance  executive-level  decision  makers  would  be  well  on  their  way  to  deriving  the 
benefits  of  big  data  through  visualization.  Those  benefits  include  the  ability  to  identify 
trends  quickly,  develop  a  better  understanding  of  the  problem  space,  establish  defensible 
baselines  for  monitoring  activities,  perform  forecasting,  and  evaluate  metrics  for  use. 
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APPENDIX  BIG  DATA  IMPLICATIONS  FOR  ENTERPRISE 

ARCHITECTURE 

The  following  excerpt  is  taken  from  Big  Data  Implications  for  Enterprise 
Architecture,  an  unpublished  manuscript  by  Isaac  Donaldson  submitted  in  partial 
fulfillment  of  the  requirements  for  the  degree  of  Master  of  Science  in  Network 
Operations  and  Technology  at  Naval  Postgraduate  School,  2013. 

Introduction 

By  April,  2011,  the  United  States  (U.S.)  Library  of  Congress  had  collected  235 
terabytes  of  data.  However,  of  the  17  U.S.  business  sectors,  15  of  them  had  more  data 
stored  than  the  Library  of  Congress,  per  company  [McKinsey  Global  Institute  (MGI), 
2011].  MGFs  report  on  big  data  (2011)  estimated  that  a  60%  increase  in  the  operating 
margins  of  retailers  would  be  possible  if  big  data  collection,  storage,  and  analysis 
techniques  were  properly  utilized.  So,  how  does  an  enterprise  derive  value  from  big  data? 

Big  data  spawns  from  many  sources  and  possesses  characteristics  which  are 
pertinent  to  the  practitioner  of  enterprise  architecture  (EA).  The  needs  of  the  enterprise 
and  how  the  data  is  to  be  processed  determines  how  an  EA  should  be  designed  to  ensure 
the  enterprise  can  derive  value  from  big  data.  The  impact  of  big  data  results  from  the 
volume,  variety,  velocity,  and  value  traits  of  the  data  and  influences  both  the  network  and 
capacity  considerations  of  the  EA.  However,  obstacles  to  implementation  exist  and  are 
either  technical  or  human,  each  of  which  requires  a  different  approach.  Should  an 
architect  carefully  consider,  plan,  and  implement  an  EA  designed  to  accommodate  big 
data,  an  enterprise  could  derive  value  that  affects  the  bottom  line. 

Big  Data 

Big  data  is  generated  by  a  variety  of  sources.  The  sources  from  which  big  data 
originate  include  industry  specific  transactions,  machine/sensor  indications,  web 
applications,  and  text  (Eerguson,  2013).  Industry  specific  transactions  can  include  call 
records  and  geographic  location  data.  Machines  generate  extremely  large  volumes  of 
information  every  day  and  can  range  in  complexity  from  simple  temperature  readings  to 
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the  performance  parameters  of  a  gas-turbine  engine.  Big  data  on  the  web  also  ranges  in 
format  from  machine  language  to  customer  comments  on  social  networks  and  also  is 
produced  in  considerably  sizeable  portions.  Text  sources  can  include  archived 
documents,  external  reports,  or  customer  account  information  (Ferguson,  2013). 

Because  big  data  comes  from  a  variety  of  sources,  it  also  possesses  characteristics 
which  distinguish  it  from  data  in  the  traditional  context.  Common  terms  used  to  define 
the  qualities  of  big  data  include  volume,  variety,  velocity,  and  value  (Dijcks,  2013).  From 
the  listing  of  sources  above,  one  can  understand  that  the  volume  of  data  generated  on  a 
daily  basis  is  enormous.  For  example,  Dijcks  (2013)  stated  that  just  a  single  jet  engine 
produces  10  terabytes  of  data  in  30  minutes.  Extrapolate  that  example  to  include  all  the 
aircraft  currently  airborne,  then  include  all  the  factory  infrastructure  around  the  globe 
collecting  data  on  production,  service  life,  and  maintenance  requirements,  and  the 
enormity  of  big  data  volumes  begins  to  emerge.  Another  characteristic  of  big  data, 
variety,  can  be  directly  translated  from  the  various  sources  into  the  variety  of  data 
formats.  In  the  context  of  EA,  various  data  formats  requires  additional  consideration  to 
ensure  the  ability  of  all  systems  to  share  data.  Velocity,  which  is  related  to  volume,  is  the 
frequency  with  which  big  data  is  created.  To  illustrate  velocity,  consider  the  relative  size 
of  a  single  Twitter  feed  (140  characters)  to  the  large  number  of  feeds  generated  in  a  given 
time  period  (Dijcks,  2013).  Einally,  value  is  the  feature  of  big  data  which  is  important  to 
the  enterprise. 

Big  Data  is  Valuable  to  the  Enterprise 

Big  data  can  provide  value  to  an  enterprise  through  various  means.  Processing 
and  then  analyzing  big  data  can  help  an  enterprise  better  understand  its  business,  the 
environment  in  which  it  operates,  and  its  customers  (Dijcks,  2013).  Having  developed  an 
enhanced  perception  of  itself  and  the  marketplace,  an  enterprise  could  stand  poised  to 
improve  productivity,  increase  competitive  advantage,  or  develop  superior  product 
innovation  processes  (Dijcks,  2013).  All  these  benefits  can  translate  into  significant 
impacts  on  the  bottom  line.  The  benefits  which  can  be  derived  from  big  data  are  unique 
to  the  specific  enterprise  and,  therefore,  the  manner  in  which  EA  design  is  approached  is 
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also  unique.  However,  in  general,  the  proper  collection,  storage,  and  analysis  of  big  data 
is  instrumental  in  the  ability  of  the  enterprise  to  reap  value  from  it. 

Impact  of  Big  Data  on  EA 

An  EA  must  be  designed  properly  to  provide  the  capability  to  an  enterprise  to 
derive  value  from  big  data.  The  characteristics  of  big  data  -  volume,  variety,  velocity,  and 
value  -  must  all  be  considered  and  planned  for  during  the  design  or  update  of  an  EA  if 
the  enterprise  wishes  to  use  big  data  to  generate  value.  Bakshi  (2012)  breaks  down  the 
EA  considerations  into  two  major  groups,  network  and  capacity. 

Network  considerations  include  data  paths,  scalability,  buffering,  and  latency 
(Bakshi,  2012).  Regarding  the  data  paths,  redundancy  provides  strength  to  an  EA 
designed  for  big  data.  Data  is  often  located  in  multiple  locations  on  an  enterprise’s 
network.  Providing  multiple  paths  among  the  data  locations  improves  the  EA’s  ability  to 
share  data.  Should  data  collection,  storage,  and  processing  needs  increase,  designing  an 
EA  to  be  scalable  would  allow  an  enterprise  the  capability  to  expand  or  contract  as 
necessary.  Considering  the  volumes  with  which  data  will  be  transmitted,  an  EA  with 
sufficient  buffers  and  queues  would  be  beneficial.  Without  those  buffers,  a  network  may 
become  overloaded  with  data  and  slow  down  or  even  crash.  The  final  point  Bakshi 
(2012)  made  regarding  network  considerations  was  that  consistent  and  predictably  low 
latency  must  be  a  trait  of  an  EA  designed  to  handle  big  data. 

Capacity  considerations  involve  dispersed  computing  and  data  locations, 
distributions,  and  volumes  (Bakshi,  2012).  The  last  three  aspects  are  actually  all 
symptoms  of  the  well  planned,  dispersed  computing  EA.  The  main  idea  of  dispersed 
computing  is  to  spread  out  the  data  amongst  the  nodes  within  the  enterprise,  possibly 
within  a  separate  big  data  warehouse,  but  more  likely  throughout  the  enterprise.  With  the 
data  distributed  throughout  the  EA,  the  processing  power  requirements  for  big  data 
analysis  can  be  shared  across  the  network.  Each  location  where  data  is  stored  must  be 
able  to  dependably  and  reliably  collect,  store,  and  analyze  volumes  of  information. 
Therefore,  high  speed,  low  latency  connections  are  key  throughout  the  enterprise  (Bakshi, 
2012).  Knowing  the  implications  of  big  data  upon  an  EA  are  one  thing.  Integrating  big 
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data  into  EA  is  another.  Many  obstacles  exist  which  ensure  the  implementation  of  big 
data-minded  EA  is  a  challenge. 

Major  Obstacles  to  Proper  Implementation  of  Big  Data  into  EA 

The  shift  from  traditional  EA  to  an  EA  which  is  designed  for  big  data  is  both  a 
technological  challenge  and  a  human  challenge  (M2  Press  wire,  2012).  The  technology 
aspect  involves  an  architect  selecting  and  introducing  IT  into  an  EA  which  may  not  have 
been  originally  designed  to  accommodate  change  or  expansion.  Obstacles  may  include 
incompatible  technologies,  big  data  tools  which  do  not  address  the  particular  needs  of  the 
enterprise,  and  hidden  technology  gaps.  Careful  consideration,  planning,  and 
implementation  of  the  data  and  application  architectures  into  the  existing  EA  is  necessary 
to  remedy  the  existing  (and  avoid  the  creation  of  new)  dysfunctionalities  and/or 
technology  gaps  (M2  Presswire,  2012).  Should  the  technology  aspect  of  EA  redesign  be 
executed  smoothly  and  successfully,  the  human  facet  must  still  be  addressed. 

The  stakeholders,  both  those  who  finance  the  EA  project  and  those  who  are  the 
end-users,  represent  the  human  aspect.  Among  the  decision  makers,  there  may  exist  a 
lack  of  awareness  regarding  the  capabilities  and  risks  associated  with  embarking  upon  an 
EA  project  (M2  Presswire,  2012).  There  also  may  exist  a  resistance  by  the  end-users  to 
change  systems  already  in  place.  Both  of  these  obstacles  can  be  overcome  through 
gaining  stakeholder  buy-in.  Through  education  and  the  inclusion  in  planning,  both 
decision  makers  and  end-users  can  be  persuaded  to  support  big  data  changes  in  EA  (M2 
Presswire,  2012).  One  other  possible  obstacle  will  be  that  it  might  be  necessary  to  change 
technology  interfaces  or  processes  to  facilitate  the  integration  of  big  data  into  business 
units.  However,  skill  gap  analysis  can  identify  where  disconnects  between  humans  and 
technology  exist  and  training  provides  the  bridge  to  cross  those  gaps. 

Conclusion 

Big  data  exhibits  characteristics  which  require  special  consideration  when 
designing  an  EA.  The  volume,  variety,  velocity,  and  value  of  big  data  must  be  understood 
by  the  EA  practitioner  before  embarking  upon  a  project  so  complex  and  risky.  When 
compared  to  traditional  methods  of  designing  EA,  big  data  requires  networks  have 
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redundant  data  paths,  offer  scalability,  provide  sufficient  buffering,  and  exhibit 
consistent,  reliably  low  latency.  As  for  capacity  considerations,  successfully  implemented 
dispersed  computing  environments  deliver  the  necessary  data  locations,  distribution,  and 
volume  handling  capability  required  by  big  data  collection,  storage,  and  analysis.  When 
technical  or  human  obstacles  arise,  an  architect  which  carefully  plans,  conducts  gap 
analysis,  acquires  stakeholder  buy-in,  and  provides  necessary  training  will  overcome 
those  hurdles.  Should  an  architect  follow  these  guidelines,  an  EA  capable  of  handling  big 
data  could  produce  improved  productivity,  increased  competitive  advantage,  and  superior 
product  innovation  processes  for  its  enterprise. 
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